Conv2D backward (dx, dW)Hard

Conv2D backward (dx, dW)

Background

This is the backward pass through the stride-1, no-padding convolution from build-cnn-01. Backprop through a conv needs two gradients — one for the kernel (dW, so the optimiser can update the weights) and one for the input (dx, so the gradient can flow to earlier layers). Both fall out of the chain rule, and both can be computed with the same triple loop as the forward pass by accumulating contributions. The key idea: a single output cell was a dot product of one input patch with one kernel, so its upstream gradient flows back to exactly that patch and that kernel.

Problem statement

Implement conv2d_backward(grad_out, x, W) returning (dx, dW). Let G=G = grad_out. Forward (from build-cnn-01) was:

out[o,i,j]=c,u,vx[c,i+u,j+v]  W[o,c,u,v]\text{out}[o, i, j] = \sum_{c,\,u,\,v} x[c,\, i+u,\, j+v]\; W[o, c, u, v]

Differentiating the loss w.r.t. each kernel weight and each input pixel gives:

dW[o,c,u,v]=i,jG[o,i,j]  x[c,i+u,j+v]dW[o, c, u, v] = \sum_{i,\,j} G[o, i, j]\; x[c,\, i+u,\, j+v] dx[c,h,w]=o,i,j,u,vi+u=h,j+v=wG[o,i,j]  W[o,c,u,v]dx[c, h, w] = \sum_{\substack{o,\,i,\,j,\,u,\,v \\ i+u=h,\; j+v=w}} G[o, i, j]\; W[o, c, u, v]

In practice you implement both by walking the same (o, i, j) loop as the forward pass: read the scalar g=G[o,i,j]g = G[o,i,j], then accumulate dW[o] += g · x[:, i:i+kH, j:j+kW] and dx[:, i:i+kH, j:j+kW] += g · W[o].

Input

  • grad_outnp.ndarray of shape (C_out, out_H, out_W): the upstream gradient of the loss w.r.t. the conv output.
  • xnp.ndarray of shape (C_in, H, W): the input that was fed to the forward pass.
  • Wnp.ndarray of shape (C_out, C_in, kH, kW): the kernel from the forward pass.

Output

Returns a tuple (dx, dW):

  • dx — shape (C_in, H, W), the gradient w.r.t. x.
  • dW — shape (C_out, C_in, kH, kW), the gradient w.r.t. the kernel.

Examples

Example 1 — one upstream gradient routes to one patch and one kernel

Input:  x = np.arange(16).reshape(1, 4, 4)
        W = [[[[1, 2],
               [3, 4]]]]                      # shape (1, 1, 2, 2)
        grad_out = zeros((1, 3, 3)); grad_out[0, 0, 0] = 1
Output: dW[0, 0] = [[0, 1],     dx[0, 0:2, 0:2] = [[1, 2],
                    [4, 5]]                        [3, 4]]
        (dx is 0 everywhere else)

Explanation: only output (0,0,0) has a nonzero upstream gradient (g=1g=1). In the forward pass that cell dotted the top-left 2×22\times2 input patch [[0,1],[4,5]] with the kernel. So dW[0,0] receives g(that patch)g\cdot(\text{that patch}), and the top-left patch of dx receives gW[0,0]g\cdot W[0,0]. Everything else is untouched because gg is zero there.

Example 2 — zero upstream gradient gives zero gradients

Input:  grad_out = np.zeros((1, 3, 3)), any x of shape (1,4,4), any W of shape (1,1,2,2)
Output: dx = zeros((1, 4, 4)), dW = zeros((1, 1, 2, 2))

Explanation: the backward pass is linear in grad_out, so no upstream signal means no gradient — dx and dW are all zeros (in the shapes of x and W).

Constraints

  • Same regime as the forward pass: stride 1, no padding, cross-correlation (no kernel flip).
  • dx has the shape of x; dW has the shape of W.
  • Accumulate with += — an input pixel feeds multiple output cells, so its gradient is a sum over all of them; overwriting instead of adding is the classic bug.
  • Iterate exactly as the forward pass: for each (o, i, j) the relevant input window is x[:, i:i+kH, j:j+kW].
  • The diagnostic tests are finite-difference checks against conv2d_naive(x, W).sum(), compared with atol=1e-3.

Notes

  • Why finite differences. Comparing your analytic dx/dW to a numerical gradient (perturb one element by ±ϵ\pm\epsilon, re-run the forward, divide) is exactly how you'd debug a real backprop bug — a wrong derivation shows up as an order-of-magnitude difference, not a subtle one.
  • Series. This completes the conv operations from build-cnn-01; build-cnn-06 composes the full forward of a tiny image classifier.
Python
Loading...

This problem ships 5 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.

  • Returns two arrays with correct shapes
  • Diagnostic: dW matches finite-difference numerical gradient
  • Diagnostic: dx matches finite-difference numerical gradient
  • Zero grad_out -> zero gradients on both x and W
  • With a nonzero grad_out at a single position, dW reflects the corresponding x patch