ReLU forward + backward
Background
ReLU — — is the workhorse non-linearity that made deep networks tractable: it is cheap, does not saturate on the positive side, and has a gradient of exactly 1 wherever it is active. Implementing both halves (forward and backward) is the smallest complete example of how a layer participates in backpropagation: the backward pass routes the upstream gradient through only the positions that were active in the forward pass.
Problem statement
Implement two functions:
relu(x) applies elementwise. relu_backward(grad_out, x) multiplies the upstream gradient by the mask (1 where the input was positive, 0 elsewhere). Note relu_backward takes the input x, not the output.
Input
relu(x)—x:np.ndarrayof any shape.relu_backward(grad_out, x)—grad_out: gradient of the loss w.r.t. the layer's output (same shape asx);x: the input that was fed torelu(same shape).
Output
relureturns an array of the same shape with applied elementwise.relu_backwardreturns the gradient of the loss w.r.t.x, same shape. At exactly the sub-gradient is 0.
Examples
Example 1 — forward zeroes the negatives
Input: x = [-2.0, -0.1, 0.0, 0.1, 5.0]
Output: [0.0, 0.0, 0.0, 0.1, 5.0]
Explanation: every negative entry (and 0 itself) maps to 0; non-negative entries pass through unchanged.
Example 2 — backward routes the gradient through active units
Input: x = [-1.0, 0.0, 2.0, -3.0, 4.0], grad_out = [1, 1, 1, 1, 1]
Output: [0.0, 0.0, 1.0, 0.0, 1.0]
Explanation: the mask is [0,0,1,0,1], so the upstream gradient survives only at the positions where the input was strictly positive. At the gradient is 0 (the chosen sub-gradient).
Constraints
- The sub-gradient at exactly is 0 — use a strict
x > 0mask (the convention PyTorch, JAX, and TensorFlow all follow). relu_backwardconsumes the inputx, not the output. (For ReLU specificallyout > 0 ⟺ x > 0, but the course convention passes the input so it generalises to other activations.)- The boolean mask
x > 0casts to{0.0, 1.0}when multiplied — no explicit.astype(float)needed. - Output shape always matches the input; the analytic backward must match a finite-difference gradient (
atol≈1e-3).
Notes
- Dying ReLU. This zero-at-and-below-0 gradient is exactly why a unit whose pre-activation stays across all training data gets zero gradient forever and never recovers — the motivation for variants like LeakyReLU and GELU.
- Series. Step 2 of build-nn; this non-linearity sits between the linear layers of the MLP you assemble later in the track.
This problem ships 5 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.
- •relu: output shape matches input
- •relu: zeroes out negatives, keeps non-negatives
- •relu_backward: grad passes through where x > 0, blocked where x <= 0
- •relu_backward: at exactly x=0 sub-gradient is 0 (the convention)
- •relu_backward: matches finite-difference gradient