Skip to content

Running Todo List

Math

Information Theory

  • [Added 2025-10-27] Surprisal vs. Perplexity vs. Entropy vs. Cross-Entropy

Mechanistic Interpretability

  • [Added 2025-10-21] Learn ARENA curriculum

Seminal Mech Interp Papers

Mech Interp Concepts to Master

  1. From IOI Paper: (1) Logit, Head, Layer Attribution, Logit Diffs (2) Activation Patching (3) Path Patching

Mech Inter Safety Application

Automated Circuit Discovery

Interp for Monitoring/Control

RL

Reading

Documentation

  • [Added 2025-10-24] Add section on current project (Dyck Interp Probe)
  • [Completed 25-10-23] Document 25-10-17 to 25-10-20 range of progress