Exponential moving average of weights
Background
Keeping an exponential moving average (EMA) of model weights is a cheap trick that often improves final accuracy: alongside the weights being trained, you maintain a smoothed "shadow" copy that lags behind and averages out the noise of individual SGD steps. At evaluation time you use the EMA weights, which sit closer to the center of the loss basin. It is standard in diffusion models, self-supervised learning (e.g. BYOL / mean teacher), and many SOTA training recipes.
Problem statement
Implement ema_update(ema, weights, decay=0.99) that performs one EMA step:
where is decay. Works on scalars or np.ndarray (elementwise).
Input
ema— current shadow value(s) (scalar ornp.ndarray).weights— current model weight value(s), same shape asema.decay—float; higher means slower, smoother tracking.
Output
The updated EMA value(s), same shape as the inputs.
Examples
Example 1
Input: ema = 10.0, weights = 20.0, decay = 0.9
Output: 11.0
Explanation: . The shadow weight moves 10% of the way toward the current weight.
Constraints
- The update is a convex combination: weight on the old EMA, on the new weights.
- Support elementwise application over arrays.
Notes
- A higher
decay(e.g. 0.999) gives a slower, smoother average — it remembers more history; a lower decay tracks the live weights more closely. - The EMA has an effective window of roughly steps, so averages over the last ~100 updates.
This problem ships 5 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.
- •Reference example
- •decay = 1 keeps the EMA unchanged
- •decay = 0 snaps EMA to the current weights
- •Works elementwise on arrays
- •Converges toward a constant weight over many steps