ReLU forward + backwardEasy

ReLU forward + backward

Background

ReLUmax(0,x)\max(0, x) — is the workhorse non-linearity that made deep networks tractable: it is cheap, does not saturate on the positive side, and has a gradient of exactly 1 wherever it is active. Implementing both halves (forward and backward) is the smallest complete example of how a layer participates in backpropagation: the backward pass routes the upstream gradient through only the positions that were active in the forward pass.

Problem statement

Implement two functions:

forward:y=max(0,x),backward:Lx=Ly    1[x>0]\text{forward:}\quad y = \max(0, x), \qquad\qquad \text{backward:}\quad \frac{\partial L}{\partial x} = \frac{\partial L}{\partial y}\;\cdot\;\mathbf{1}[x > 0]

relu(x) applies max(0,x)\max(0,x) elementwise. relu_backward(grad_out, x) multiplies the upstream gradient by the mask 1[x>0]\mathbf{1}[x>0] (1 where the input was positive, 0 elsewhere). Note relu_backward takes the input x, not the output.

Input

  • relu(x)x: np.ndarray of any shape.
  • relu_backward(grad_out, x)grad_out: gradient of the loss w.r.t. the layer's output (same shape as x); x: the input that was fed to relu (same shape).

Output

  • relu returns an array of the same shape with max(0,x)\max(0, x) applied elementwise.
  • relu_backward returns the gradient of the loss w.r.t. x, same shape. At exactly x=0x = 0 the sub-gradient is 0.

Examples

Example 1 — forward zeroes the negatives

Input:  x = [-2.0, -0.1, 0.0, 0.1, 5.0]
Output: [0.0, 0.0, 0.0, 0.1, 5.0]

Explanation: every negative entry (and 0 itself) maps to 0; non-negative entries pass through unchanged.

Example 2 — backward routes the gradient through active units

Input:  x = [-1.0, 0.0, 2.0, -3.0, 4.0], grad_out = [1, 1, 1, 1, 1]
Output: [0.0, 0.0, 1.0, 0.0, 1.0]

Explanation: the mask 1[x>0]\mathbf{1}[x>0] is [0,0,1,0,1], so the upstream gradient survives only at the positions where the input was strictly positive. At x=0x=0 the gradient is 0 (the chosen sub-gradient).

Constraints

  • The sub-gradient at exactly x=0x = 0 is 0 — use a strict x > 0 mask (the convention PyTorch, JAX, and TensorFlow all follow).
  • relu_backward consumes the input x, not the output. (For ReLU specifically out > 0 ⟺ x > 0, but the course convention passes the input so it generalises to other activations.)
  • The boolean mask x > 0 casts to {0.0, 1.0} when multiplied — no explicit .astype(float) needed.
  • Output shape always matches the input; the analytic backward must match a finite-difference gradient (atol≈1e-3).

Notes

  • Dying ReLU. This zero-at-and-below-0 gradient is exactly why a unit whose pre-activation stays 0\le 0 across all training data gets zero gradient forever and never recovers — the motivation for variants like LeakyReLU and GELU.
  • Series. Step 2 of build-nn; this non-linearity sits between the linear layers of the MLP you assemble later in the track.
Python
Loading...

This problem ships 5 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.

  • relu: output shape matches input
  • relu: zeroes out negatives, keeps non-negatives
  • relu_backward: grad passes through where x > 0, blocked where x <= 0
  • relu_backward: at exactly x=0 sub-gradient is 0 (the convention)
  • relu_backward: matches finite-difference gradient