Nesterov accelerated gradientMedium

Nesterov accelerated gradient

Background

Nesterov Accelerated Gradient (NAG) improves on classical momentum with a look-ahead: it first jumps in the direction of the accumulated velocity, evaluates the gradient there, and only then corrects. This anticipatory step gives faster, more stable convergence than plain momentum, which evaluates the gradient at the current point.

Problem statement

Implement nag_optimizer(parameter, grad_fn, velocity, learning_rate=0.01, momentum=0.9) for one step:

θahead=θμv,g=f(θahead)\theta_{\text{ahead}} = \theta - \mu v, \qquad g = \nabla f(\theta_{\text{ahead}}) vμv+ηg,θθvv \leftarrow \mu v + \eta g, \qquad \theta \leftarrow \theta - v

Return the updated parameter and velocity, rounded to 5 decimals.

Input

  • parameter — current parameter value(s).
  • grad_fn — a callable returning the gradient at a given point.
  • velocity — the current momentum buffer.
  • learning_ratefloat, η\eta.
  • momentumfloat, μ[0,1)\mu \in [0, 1).

Output

Returns (updated_parameter, updated_velocity), each rounded to 5 decimals.

Examples

Example 1

Input:  parameter = 1.0, grad_fn = lambda x: x, velocity = 0.1
        (learning_rate = 0.01, momentum = 0.9)
Output: (0.9009, 0.0991)

Explanation: look-ahead =1.00.9(0.1)=0.91=1.0 - 0.9(0.1) = 0.91; the gradient there is 0.910.91. New velocity =0.9(0.1)+0.01(0.91)=0.0991=0.9(0.1)+0.01(0.91)=0.0991, so θ=1.00.0991=0.9009\theta = 1.0 - 0.0991 = 0.9009.

Constraints

  • Evaluate grad_fn at the look-ahead point θμv\theta - \mu v, not at θ\theta.
  • Velocity update: v=μv+ηgv = \mu v + \eta g; parameter update: θ=θv\theta = \theta - v.
  • Round both outputs to 5 decimals.

Notes

  • The look-ahead is the whole point: by "peeking" where momentum is about to carry it, NAG corrects overshoot a step earlier than classical momentum.
  • It underlies many strong CNN recipes (SGD with Nesterov momentum).
Python
Loading...

This problem ships 4 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.

  • Reference example
  • Gradient is evaluated at the look-ahead point
  • Velocity follows mu*v + lr*grad
  • momentum = 0 reduces to plain gradient descent