Skip to content

2026-01-09 | ARBOx Day 5

Goal: Day 5 of ARBOx: Indirect Object Identification Paper Reproduction

Summary: ARENA 1.4.1 Notebook: Circuits in IOI

Work sessions

In Out
10:00 18:00

Learning about circuits was one of the coolest things in the MechInterp section of ARBOx. This is the most clear evidence to me that models are implementing very non-trivial algorithms and are not just "stochastic parrots".

Left 1 hour early in the day to go to the LISA ARENA x ARBOx social.

Key concepts that I will need to revisit:

  1. Logit, Head, Layer Attribution, Logit Diffs

  2. Activation Patching

  3. Path Patching