Google presents AtP
An efficient and scalable method for localizing LLM behaviour to components.
Activation Patching is a method of directly computing causal attributions of behavior to model components.
Join the discussion on this paper page.
Google presents AtP
An efficient and scalable method for localizing LLM behaviour to components.
Activation Patching is a method of directly computing causal attributions of behavior to model components.
Join the discussion on this paper page.
Comments are closed.