RMSprop optimizerEasy

RMSprop optimizer

Background

RMSprop fixes Adagrad's vanishing learning rate by tracking an exponential moving average of squared gradients instead of their cumulative sum. The accumulator can shrink as well as grow, so the per-parameter step size adapts to recent gradient magnitude — a robust default for RNNs and online settings.

Problem statement

Implement rmsprop_optimizer(parameter, grad, G, learning_rate=0.01, beta=0.9, epsilon=1e-8) for one update step:

GβG+(1β)g2,θθηgG+ϵG \leftarrow \beta G + (1-\beta)g^2, \qquad \theta \leftarrow \theta - \frac{\eta\, g}{\sqrt{G} + \epsilon}

Return the updated parameter and accumulator, rounded to 5 decimals.

Input

  • parameter, grad, G — current value(s), gradient, and EMA accumulator (same shape; G starts at 0).
  • learning_ratefloat, η\eta.
  • betafloat, the EMA decay (default 0.9).
  • epsilonfloat.

Output

Returns (updated_parameter, updated_G), each rounded to 5 decimals.

Examples

Example 1

Input:  parameter = 1.0, grad = 0.1, G = 1.0, learning_rate = 0.01, beta = 0.9
Output: (0.99895, 0.901)

Explanation: G=0.9(1.0)+0.1(0.12)=0.901G = 0.9(1.0) + 0.1(0.1^2) = 0.901; the step is 0.010.1/0.9010.0010540.01\cdot0.1/\sqrt{0.901}\approx0.001054, so θ0.99895\theta \approx 0.99895.

Constraints

  • Update GG as an EMA (βG+(1β)g2\beta G + (1-\beta)g^2) before computing the step.
  • Step =ηg/(G+ϵ)= \eta g/(\sqrt{G}+\epsilon).
  • Round both outputs to 5 decimals.

Notes

  • Versus Adagrad: replacing the cumulative sum with an EMA lets GG forget old gradients, so the effective rate doesn't decay to zero on long runs.
  • Adam is essentially RMSprop's second-moment EMA plus a first-moment (momentum) EMA with bias correction.
Python
Loading...

This problem ships 4 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.

  • Reference example
  • Accumulator is an EMA of squared gradients
  • beta = 0 sets G to grad squared
  • Works elementwise on arrays