Adadelta optimizerMedium

Adadelta optimizer

Background

Adadelta extends Adagrad to stop its learning rate from decaying to zero — and removes the need to set a learning rate at all. It keeps two EMAs: one of squared gradients and one of squared parameter updates, and scales each step by the ratio of their RMS values, so the update is automatically unit-consistent.

Problem statement

Implement adadelta_optimizer(parameter, grad, u, v, rho=0.95, epsilon=1e-6) for one step:

uρu+(1ρ)g2,Δx=v+ϵu+ϵgu \leftarrow \rho u + (1-\rho)g^2, \qquad \Delta x = -\frac{\sqrt{v + \epsilon}}{\sqrt{u + \epsilon}}\, g vρv+(1ρ)Δx2,θθ+Δxv \leftarrow \rho v + (1-\rho)\,\Delta x^2, \qquad \theta \leftarrow \theta + \Delta x

Return the updated parameter, uu, and vv, rounded to 5 decimals.

Input

  • parameter, grad — current value(s) and gradient.
  • u — EMA of squared gradients (starts at 0).
  • v — EMA of squared updates (starts at 0).
  • rhofloat, the EMA decay.
  • epsilonfloat.

Output

Returns (updated_parameter, updated_u, updated_v), each rounded to 5 decimals.

Examples

Example 1

Input:  parameter = 1.0, grad = 0.1, u = 1.0, v = 1.0, rho = 0.95, epsilon = 1e-6
Output: (0.89743, 0.9505, 0.95053)

Explanation: u=0.95(1)+0.05(0.01)=0.9505u = 0.95(1)+0.05(0.01)=0.9505; Δx=1/0.95050.10.1026\Delta x = -\sqrt{1}/\sqrt{0.9505}\cdot0.1\approx-0.1026; then vv updates to 0.950530.95053 and θ=10.1026=0.89743\theta = 1 - 0.1026 = 0.89743.

Constraints

  • Update uu first; compute Δx=RMS(v)RMS(u)g\Delta x = -\frac{\text{RMS}(v)}{\text{RMS}(u)}g with RMS()=+ϵ\text{RMS}(\cdot)=\sqrt{\cdot + \epsilon}; then update vv with Δx2\Delta x^2.
  • θθ+Δx\theta \leftarrow \theta + \Delta x (the sign is already in Δx\Delta x).
  • Round all three outputs to 5 decimals.

Notes

  • The RMS ratio makes the update unit-consistent (same units as θ\theta), which is why Adadelta needs no learning-rate hyperparameter.
  • vv (the update accumulator) supplies the numerator RMS — a one-step-lagged estimate of how large the steps have been.
Python
Loading...

This problem ships 4 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.

  • Reference example
  • u is an EMA of squared gradients
  • Returns three values; u, v non-negative
  • Works elementwise on arrays