Nesterov accelerated gradient
Background
Nesterov Accelerated Gradient (NAG) improves on classical momentum with a look-ahead: it first jumps in the direction of the accumulated velocity, evaluates the gradient there, and only then corrects. This anticipatory step gives faster, more stable convergence than plain momentum, which evaluates the gradient at the current point.
Problem statement
Implement nag_optimizer(parameter, grad_fn, velocity, learning_rate=0.01, momentum=0.9) for one step:
Return the updated parameter and velocity, rounded to 5 decimals.
Input
parameter— current parameter value(s).grad_fn— a callable returning the gradient at a given point.velocity— the current momentum buffer.learning_rate—float, .momentum—float, .
Output
Returns (updated_parameter, updated_velocity), each rounded to 5 decimals.
Examples
Example 1
Input: parameter = 1.0, grad_fn = lambda x: x, velocity = 0.1
(learning_rate = 0.01, momentum = 0.9)
Output: (0.9009, 0.0991)
Explanation: look-ahead ; the gradient there is . New velocity , so .
Constraints
- Evaluate
grad_fnat the look-ahead point , not at . - Velocity update: ; parameter update: .
- Round both outputs to 5 decimals.
Notes
- The look-ahead is the whole point: by "peeking" where momentum is about to carry it, NAG corrects overshoot a step earlier than classical momentum.
- It underlies many strong CNN recipes (SGD with Nesterov momentum).
This problem ships 4 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.
- •Reference example
- •Gradient is evaluated at the look-ahead point
- •Velocity follows mu*v + lr*grad
- •momentum = 0 reduces to plain gradient descent