2026-04-01 | MOLTs

Goal: Mixture of Linear Transforms (MOLTs)

Summary: MOLT Transforms Collapsing to single Transform even with sparsity penalty = 0

Work sessions

In	Out
08:00	09:15
02:00	03:30

With varying levels of sparsity from lambda=[0, 1e-5, 3e-5, 1e-4, 3e-4, 1e-3, 4e-3, 1e-2] all resulting in a single transform being used (collapse to L0 = 0 even for when lambda=0)
The MOLTs paper uses both a tanh/L0 sparsity and ReLU/JumpReLU activation
In experiment 1, we used tanh/ReLU, since the preliminary paper does not specify, we will try the other 3 combinations as a sanity check tomorrow (see collapse below)

Lambda 0 MOLT Training Collapse