2026-04-28 | MOLT Steering
Goal: Steering with MOLT Features and Top Activating prompts dashboard; see also PR for MOLTs to master and Wandb Runs
Summary: Feedback from Georg on using a translation task to verify necessary and sufficient conditions
Work sessions
| In | Out |
|---|---|
| 00:30 | 01:20 |
| 02:00 | 06:00 |
Feedback on Scaling
- Train MOLTs on all layers of GPT-2 and Gemma-3-4B-IT
Necessary and Sufficient
-
Necessary --> If we ablate this transform, does the expected behavior go away?
-
Sufficient --> This feature must be active when the hypothesized behavior occurs
WandB Runs
-
MOLT Training with Varying Sparsity: 1e-4 and 1.5e-4 sparsity have balanced MSE vs. L0
-
Rebase MOLT on Master Branch: Verify that after the rebase the runs are essentially the same