Adagrad optimizerEasy

Adagrad optimizer

Background

Adagrad gives every parameter its own learning rate by dividing the global rate by the square root of the sum of all past squared gradients. Parameters with large/frequent gradients take smaller steps; rare-feature parameters take larger steps — handy for sparse data. Its weakness: the accumulator only grows, so the effective rate eventually shrinks toward zero (which RMSprop/Adam fix).

Problem statement

Implement adagrad_optimizer(parameter, grad, G, learning_rate=0.01, epsilon=1e-8) for one update step:

GG+g2,θθηgG+ϵG \leftarrow G + g^2, \qquad \theta \leftarrow \theta - \frac{\eta\, g}{\sqrt{G} + \epsilon}

Return the updated parameter and accumulator, both rounded to 5 decimals.

Input

  • parameter — current parameter value(s) (scalar or np.ndarray).
  • grad — current gradient gg, same shape.
  • G — accumulated squared gradients so far (same shape; starts at 0).
  • learning_ratefloat, the global rate η\eta.
  • epsilonfloat.

Output

Returns a tuple (updated_parameter, updated_G), each rounded to 5 decimals.

Examples

Example 1

Input:  parameter = 1.0, grad = 0.1, G = 1.0
Output: (0.999, 1.01)

Explanation: G=1.0+0.12=1.01G = 1.0 + 0.1^2 = 1.01; the step is 0.010.1/1.010.0009950.01\cdot0.1/\sqrt{1.01} \approx 0.000995, so θ0.999\theta \approx 0.999.

Constraints

  • Update GG before computing the step (accumulate g2g^2).
  • Step =ηg/(G+ϵ)= \eta g/(\sqrt{G}+\epsilon), subtracted from the parameter.
  • Round both outputs to 5 decimals.

Notes

  • GG is the sum (not a running average) of squared gradients, so it never decreases — the per-parameter rate is monotonically non-increasing.
  • That monotonic decay is Adagrad's flaw on long runs; RMSprop replaces the sum with an exponential moving average to keep learning.
Python
Loading...

This problem ships 4 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.

  • Reference example
  • Accumulator adds grad squared
  • Larger accumulated G yields a smaller step
  • Works elementwise on arrays