Lion optimizer stepMediumnumpyoptimizationoptimizerlionsign-momentum
Lion optimizer step
Background
Lion (EvoLved Sign Momentum, Google 2023) is a memory-efficient optimizer that updates each parameter by the sign of an interpolated momentum — so every step has the same magnitude (the learning rate). It tracks only one momentum buffer (half of Adam's state) and often matches or beats AdamW on large models.
Problem statement
Implement lion_optimizer(parameter, grad, m, learning_rate=0.001, beta1=0.9, beta2=0.99, weight_decay=0.0) for one update step:
Return the updated parameter and momentum.
Input
parameter,grad,m— current value(s), gradient, and momentum buffer (same shape;mstarts at 0).learning_rate—float, .beta1,beta2—float, the two interpolation rates.weight_decay—float, the decoupled decay .
Output
Returns (updated_parameter, updated_m).
Examples
Example 1
Input: parameter = 1.0, grad = 0.1, m = 0.0, learning_rate = 0.01, beta1 = 0.9, beta2 = 0.99
Output: (0.99, 0.001)
Explanation: ; , so . Then .
Constraints
- The parameter step uses of the -interpolated momentum, plus decoupled weight decay .
- Update the momentum with a different rate , after the parameter step has used the interpolation.
- .
Notes
- Because the update is a pure sign, every coordinate moves by exactly (before weight decay) — Lion needs a smaller LR and larger weight decay than Adam.
- Using two betas (interpolate with for the step, update the buffer with ) is Lion's distinctive trick versus plain sign-momentum.
Python
Loading...
This problem ships 4 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.
- •Reference example
- •Step magnitude equals lr when sign is nonzero (no weight decay)
- •Negative gradient pushes the parameter up
- •Momentum uses beta2