Skip to content

Towards Automated Circuit Discovery for Mechanistic Interpretability

Authors: Arthur Conmy, Augustine N. Mavor-Parker, Aengus Lynch, Stefan Heimersheim, AdriĆ  Garriga-Alonso

Publication Date: 2023-10-28

Full Paper: Towards Automated Circuit Discovery for Mechanistic Interpretability

In Progress Reproduction: Github Repo

Notes

Keywords

  1. Mech Interp -> "reverse-engineering model components into human-understandable algorithms"

  2. Models = Computational Graphs to perform a task, circuits = "subgraphs with distinct functionality"