Google presents AtP
An efficient and scalable method for localizing LLM behaviour to components.
Activation Patching is a method of directly computing causal attributions of behavior to model components.
Join the discussion on this paper page.
Posted in computing
Google presents AtP
An efficient and scalable method for localizing LLM behaviour to components.
Activation Patching is a method of directly computing causal attributions of behavior to model components.
Join the discussion on this paper page.