2026-04-01 | MOLTs
Goal: Mixture of Linear Transforms (MOLTs)
Summary: MOLT Transforms Collapsing to single Transform even with sparsity penalty = 0
Work sessions
| In | Out |
|---|---|
| 08:00 | 09:15 |
| 02:00 | 03:30 |
-
With varying levels of sparsity from lambda=[0, 1e-5, 3e-5, 1e-4, 3e-4, 1e-3, 4e-3, 1e-2] all resulting in a single transform being used (collapse to L0 = 0 even for when lambda=0)
-
The MOLTs paper uses both a tanh/L0 sparsity and ReLU/JumpReLU activation
-
In experiment 1, we used tanh/ReLU, since the preliminary paper does not specify, we will try the other 3 combinations as a sanity check tomorrow (see collapse below)
