Skip to content

2026-04-28 | MOLT Steering

Goal: Steering with MOLT Features and Top Activating prompts dashboard; see also PR for MOLTs to master and Wandb Runs

Summary: Feedback from Georg on using a translation task to verify necessary and sufficient conditions

Work sessions

In Out
00:30 01:20
02:00 06:00

Feedback on Scaling

  • Train MOLTs on all layers of GPT-2 and Gemma-3-4B-IT

Necessary and Sufficient

  1. Necessary --> If we ablate this transform, does the expected behavior go away?

  2. Sufficient --> This feature must be active when the hypothesized behavior occurs

WandB Runs

  1. Scaling MOLT Multiplier and Tokens:

  2. MOLT Training with Varying Sparsity: 1e-4 and 1.5e-4 sparsity have balanced MSE vs. L0

  3. BF16 Mixed Precision Training

  4. Rebase MOLT on Master Branch: Verify that after the rebase the runs are essentially the same