Toggle light / dark theme

Paper page — AtP*: An efficient and scalable method for localizing LLM behaviour to components

Posted in computing

Google presents AtP

An efficient and scalable method for localizing LLM behaviour to components.

Activation Patching is a method of directly computing causal attributions of behavior to model components.


Join the discussion on this paper page.