Lion optimizer stepMedium

Lion optimizer step

Background

Lion (EvoLved Sign Momentum, Google 2023) is a memory-efficient optimizer that updates each parameter by the sign of an interpolated momentum — so every step has the same magnitude (the learning rate). It tracks only one momentum buffer (half of Adam's state) and often matches or beats AdamW on large models.

Problem statement

Implement lion_optimizer(parameter, grad, m, learning_rate=0.001, beta1=0.9, beta2=0.99, weight_decay=0.0) for one update step:

c=β1m+(1β1)g,θθη(sign(c)+λθ)c = \beta_1 m + (1-\beta_1) g, \qquad \theta \leftarrow \theta - \eta\big(\operatorname{sign}(c) + \lambda\theta\big) mβ2m+(1β2)gm \leftarrow \beta_2 m + (1-\beta_2) g

Return the updated parameter and momentum.

Input

  • parameter, grad, m — current value(s), gradient, and momentum buffer (same shape; m starts at 0).
  • learning_ratefloat, η\eta.
  • beta1, beta2float, the two interpolation rates.
  • weight_decayfloat, the decoupled decay λ\lambda.

Output

Returns (updated_parameter, updated_m).

Examples

Example 1

Input:  parameter = 1.0, grad = 0.1, m = 0.0, learning_rate = 0.01, beta1 = 0.9, beta2 = 0.99
Output: (0.99, 0.001)

Explanation: c=0.9(0)+0.1(0.1)=0.01c = 0.9(0) + 0.1(0.1) = 0.01; sign(c)=1\operatorname{sign}(c)=1, so θ=1.00.01(1)=0.99\theta = 1.0 - 0.01(1) = 0.99. Then m=0.99(0)+0.01(0.1)=0.001m = 0.99(0) + 0.01(0.1) = 0.001.

Constraints

  • The parameter step uses sign\operatorname{sign} of the β1\beta_1-interpolated momentum, plus decoupled weight decay λθ\lambda\theta.
  • Update the momentum mm with a different rate β2\beta_2, after the parameter step has used the β1\beta_1 interpolation.
  • sign(0)=0\operatorname{sign}(0) = 0.

Notes

  • Because the update is a pure sign, every coordinate moves by exactly ±η\pm\eta (before weight decay) — Lion needs a smaller LR and larger weight decay than Adam.
  • Using two betas (interpolate with β1\beta_1 for the step, update the buffer with β2\beta_2) is Lion's distinctive trick versus plain sign-momentum.
Python
Loading...

This problem ships 4 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.

  • Reference example
  • Step magnitude equals lr when sign is nonzero (no weight decay)
  • Negative gradient pushes the parameter up
  • Momentum uses beta2