Basic autograd operationsMedium

Basic autograd operations

Background

Automatic differentiation is the engine behind every deep-learning framework. Each value remembers the operation that produced it and how to push gradients back to its inputs. Running these local rules in reverse topological order — reverse-mode autodiff (backpropagation) — computes the gradient of a final output with respect to every intermediate value in a single backward pass. This problem builds a minimal scalar autograd engine (in the spirit of Karpathy's micrograd).

Problem statement

Implement a Value class supporting +, *, and relu, plus a backward() method.

For each operation, store a local backward rule that accumulates into the inputs' .grad:

(a+b)a=1,(ab)a=b,relu(a)a=1[a>0]\frac{\partial (a+b)}{\partial a}=1,\quad \frac{\partial (a\cdot b)}{\partial a}=b,\quad \frac{\partial\,\text{relu}(a)}{\partial a}=\mathbb{1}[a>0]

backward() builds a topological order of the graph, seeds the output gradient to 1, and calls each node's local rule in reverse.

Input

A graph of Value objects built with Python operators, e.g. d = a + b * c, followed by a call to .backward() on the output node.

Output

After backward(), every Value in the graph has its .grad set to (output)/(that value)\partial(\text{output})/\partial(\text{that value}).

Examples

Example 1

a = Value(2); b = Value(-3); c = Value(10)
d = a + b * c          # d.data = 2 + (-30) = -28
e = d.relu()           # e.data = 0  (since -28 < 0)
e.backward()
# a.grad = 0, b.grad = 0, c.grad = 0

Explanation: relu of a negative number outputs 0 and has zero local gradient, so no gradient flows back — every input gradient is 0.

Constraints

  • Gradients must accumulate (+=), so a value used in multiple places sums its contributions.
  • relu's local gradient is 1 where the input is positive, else 0.
  • backward() must visit nodes in reverse topological order after seeding the output's grad = 1.

Notes

  • Accumulation (+= not =) is what makes x + x produce x.grad == 2 — both edges contribute.
  • The same topological-sort-then-reverse pattern scales to tensors; only the per-op local gradients change.
Python
Loading...

This problem ships 4 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.

  • Reference example: ReLU on a negative value zeros all gradients
  • Product-plus-self gradient: z = x*y + x
  • ReLU forward and gradient for a positive input
  • Gradient accumulates when a value is reused: y = x + x