Adadelta optimizerMediumnumpyoptimizationoptimizeradaptiveadadelta
Adadelta optimizer
Background
Adadelta extends Adagrad to stop its learning rate from decaying to zero — and removes the need to set a learning rate at all. It keeps two EMAs: one of squared gradients and one of squared parameter updates, and scales each step by the ratio of their RMS values, so the update is automatically unit-consistent.
Problem statement
Implement adadelta_optimizer(parameter, grad, u, v, rho=0.95, epsilon=1e-6) for one step:
Return the updated parameter, , and , rounded to 5 decimals.
Input
parameter,grad— current value(s) and gradient.u— EMA of squared gradients (starts at 0).v— EMA of squared updates (starts at 0).rho—float, the EMA decay.epsilon—float.
Output
Returns (updated_parameter, updated_u, updated_v), each rounded to 5 decimals.
Examples
Example 1
Input: parameter = 1.0, grad = 0.1, u = 1.0, v = 1.0, rho = 0.95, epsilon = 1e-6
Output: (0.89743, 0.9505, 0.95053)
Explanation: ; ; then updates to and .
Constraints
- Update first; compute with ; then update with .
- (the sign is already in ).
- Round all three outputs to 5 decimals.
Notes
- The RMS ratio makes the update unit-consistent (same units as ), which is why Adadelta needs no learning-rate hyperparameter.
- (the update accumulator) supplies the numerator RMS — a one-step-lagged estimate of how large the steps have been.
Python
Loading...
This problem ships 4 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.
- •Reference example
- •u is an EMA of squared gradients
- •Returns three values; u, v non-negative
- •Works elementwise on arrays