Skip to content

2025-12-01 | Reproducing Neo et al.

Goal: Reproduce Neo et al. 2024

Summary: Allow tokenizer to process Batch Size > 1

Work sessions

In Out
00:00 00:35
15:30 16:30

Reproducing Neo et al.

  1. Allow tokenizer to have multiple batches
Tokenizer Padding ID
  • padding: GPT-2 does not have a padding token by default
  • instead, use the end-of-sequence (EOS) token instead
    tokenizer.pad_token = tokenizer.eos_token
    tokenizer.pad_token_id = tokenizer.eos_token_id
    

Thus, when tokenizing two prompts, the attention mask/padding will extend prompts to the longest prompt in the batch

"hi there" --> [5303, 612, 50256, 50256, 50256] # See that 50256 is the padding token
"what time is it?" --> [10919, 640, 318, 340, 30]

Therefore, we see the attention mask 0-ed out for hi there

"hi there" --> [1, 1, 0, 0, 0]
"what time is it?" --> [1, 1, 1, 1, 1]

  1. Right-truncate long prompts with truncation=True in tokenizer(prompt, return_tensors="pt", padding=True, truncation=True)
  2. OOM (Out of Memory) issues with high batch size inference
    • Solution: support using Collab with VsCode to add GPU Support
    • Ensure that tensors are on the same device during computation
    • Results: 4 min = 4000 examples vs. before 2000

Reading List

Add How Can Interpretability Researchers Help AGI Go Well? to reading list

Next Steps

  1. finish section 4.2 activating neurons
  2. Write out the prompts to a serialized form (e.g. JSON/CSV)
  3. Truncate prompts to be 80% activation only