2026-01-09 | ARBOx Day 5
Goal: Day 5 of ARBOx: Indirect Object Identification Paper Reproduction
Summary: ARENA 1.4.1 Notebook: Circuits in IOI
Work sessions
| In | Out |
|---|---|
| 10:00 | 18:00 |
Learning about circuits was one of the coolest things in the MechInterp section of ARBOx. This is the most clear evidence to me that models are implementing very non-trivial algorithms and are not just "stochastic parrots".
Left 1 hour early in the day to go to the LISA ARENA x ARBOx social.
Key concepts that I will need to revisit:
-
Logit, Head, Layer Attribution, Logit Diffs
-
Activation Patching
-
Path Patching