Conv2D backward (dx, dW)
Background
This is the backward pass through the stride-1, no-padding convolution from build-cnn-01. Backprop through a conv needs two gradients — one for the kernel (dW, so the optimiser can update the weights) and one for the input (dx, so the gradient can flow to earlier layers). Both fall out of the chain rule, and both can be computed with the same triple loop as the forward pass by accumulating contributions. The key idea: a single output cell was a dot product of one input patch with one kernel, so its upstream gradient flows back to exactly that patch and that kernel.
Problem statement
Implement conv2d_backward(grad_out, x, W) returning (dx, dW). Let grad_out. Forward (from build-cnn-01) was:
Differentiating the loss w.r.t. each kernel weight and each input pixel gives:
In practice you implement both by walking the same (o, i, j) loop as the forward pass: read the scalar , then accumulate dW[o] += g · x[:, i:i+kH, j:j+kW] and dx[:, i:i+kH, j:j+kW] += g · W[o].
Input
grad_out—np.ndarrayof shape(C_out, out_H, out_W): the upstream gradient of the loss w.r.t. the conv output.x—np.ndarrayof shape(C_in, H, W): the input that was fed to the forward pass.W—np.ndarrayof shape(C_out, C_in, kH, kW): the kernel from the forward pass.
Output
Returns a tuple (dx, dW):
dx— shape(C_in, H, W), the gradient w.r.t.x.dW— shape(C_out, C_in, kH, kW), the gradient w.r.t. the kernel.
Examples
Example 1 — one upstream gradient routes to one patch and one kernel
Input: x = np.arange(16).reshape(1, 4, 4)
W = [[[[1, 2],
[3, 4]]]] # shape (1, 1, 2, 2)
grad_out = zeros((1, 3, 3)); grad_out[0, 0, 0] = 1
Output: dW[0, 0] = [[0, 1], dx[0, 0:2, 0:2] = [[1, 2],
[4, 5]] [3, 4]]
(dx is 0 everywhere else)
Explanation: only output (0,0,0) has a nonzero upstream gradient (). In the forward pass that cell dotted the top-left input patch [[0,1],[4,5]] with the kernel. So dW[0,0] receives , and the top-left patch of dx receives . Everything else is untouched because is zero there.
Example 2 — zero upstream gradient gives zero gradients
Input: grad_out = np.zeros((1, 3, 3)), any x of shape (1,4,4), any W of shape (1,1,2,2)
Output: dx = zeros((1, 4, 4)), dW = zeros((1, 1, 2, 2))
Explanation: the backward pass is linear in grad_out, so no upstream signal means no gradient — dx and dW are all zeros (in the shapes of x and W).
Constraints
- Same regime as the forward pass: stride 1, no padding, cross-correlation (no kernel flip).
dxhas the shape ofx;dWhas the shape ofW.- Accumulate with
+=— an input pixel feeds multiple output cells, so its gradient is a sum over all of them; overwriting instead of adding is the classic bug. - Iterate exactly as the forward pass: for each
(o, i, j)the relevant input window isx[:, i:i+kH, j:j+kW]. - The diagnostic tests are finite-difference checks against
conv2d_naive(x, W).sum(), compared withatol=1e-3.
Notes
- Why finite differences. Comparing your analytic
dx/dWto a numerical gradient (perturb one element by , re-run the forward, divide) is exactly how you'd debug a real backprop bug — a wrong derivation shows up as an order-of-magnitude difference, not a subtle one. - Series. This completes the conv operations from
build-cnn-01;build-cnn-06composes the full forward of a tiny image classifier.
This problem ships 5 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.
- •Returns two arrays with correct shapes
- •Diagnostic: dW matches finite-difference numerical gradient
- •Diagnostic: dx matches finite-difference numerical gradient
- •Zero grad_out -> zero gradients on both x and W
- •With a nonzero grad_out at a single position, dW reflects the corresponding x patch