2025-12-01 | Reproducing Neo et al.
Goal: Reproduce Neo et al. 2024
Summary: Allow tokenizer to process Batch Size > 1
Work sessions
| In | Out |
|---|---|
| 00:00 | 00:35 |
| 15:30 | 16:30 |
Reproducing Neo et al.
- Allow tokenizer to have multiple batches
Tokenizer Padding ID
padding: GPT-2 does not have a padding token by default- instead, use the end-of-sequence (EOS) token instead
Thus, when tokenizing two prompts, the attention mask/padding will extend prompts to the longest prompt in the batch
"hi there" --> [5303, 612, 50256, 50256, 50256] # See that 50256 is the padding token
"what time is it?" --> [10919, 640, 318, 340, 30]
Therefore, we see the attention mask 0-ed out for hi there
- Right-truncate long prompts with
truncation=Trueintokenizer(prompt, return_tensors="pt", padding=True, truncation=True) - OOM (Out of Memory) issues with high batch size inference
- Solution: support using Collab with VsCode to add GPU Support
- Ensure that tensors are on the same device during computation
- Results: 4 min = 4000 examples vs. before 2000
Reading List
Add How Can Interpretability Researchers Help AGI Go Well? to reading list
Next Steps
- finish section 4.2 activating neurons
- Write out the prompts to a serialized form (e.g. JSON/CSV)
- Truncate prompts to be 80% activation only