Conv2D forward (padding + stride)
Background
This is the convolution real CNNs actually use: the cross-correlation of build-cnn-01 plus two extra knobs — zero-padding and stride. Padding adds a border of zeros so the kernel can sit over the edges (and, chosen right, keeps the spatial size unchanged); stride is how far the kernel hops each step, which downsamples the feature map. Together they are the bedrock of CNN architectures: stack padded stride-1 convs to learn features without shrinking, use stride-2 convs to halve resolution.
Problem statement
Implement conv2d(x, W, padding=0, stride=1): zero-pad the input by padding on each spatial side, then slide the kernel with step stride, computing cross-correlation (no kernel flip). Let be x zero-padded by on each side of and . For each output channel and position :
with output spatial size (floor division, padding, stride):
Input
x—np.ndarrayof shape(C_in, H, W): the input feature map.W—np.ndarrayof shape(C_out, C_in, kH, kW): the kernel.padding—int: number of zero cells added on each side of and (the input becomes(C_in, H+2p, W+2p)).stride—int: the step size when sliding the kernel.
Output
Returns an np.ndarray of shape (C_out, out_H, out_W) using the floor-division formula above.
Examples
Example 1 — zero-padding, "same" output ()
Input: x = np.ones((1, 3, 3)), W = np.ones((1, 1, 3, 3)), padding=1, stride=1
Output: shape (1, 3, 3); out[0,0,0] = 4, out[0,1,1] = 9
Explanation: padding 1 turns the input into a map with a zero border, and with stride 1 preserves the output ("same" padding). The top-left window straddles the corner, where only the inner block of ones survives → sum . The centre window covers a full of ones → sum . The corner being (not ) proves the padding is zeros, not edge-replication.
Example 2 — stride 2 downsamples
Input: x shape (2, 8, 8), W shape (3, 2, 2, 2), padding=0, stride=2
Output: shape (3, 4, 4)
Explanation: . A stride of 2 advances the window two cells at a time, roughly halving each spatial dimension; the cross-channel reduction is unchanged from build-cnn-01.
Constraints
- Padding is zeros only — never edge-replicate or reflect (
np.pad(x, ((0,0),(p,p),(p,p)))defaults to zeros). - Output sizing uses floor division: .
- The window for output
(i, j)starts at(i*stride, j*stride)in the padded input. - "Same" padding for an odd kernel is with stride 1 — it preserves .
- Cross-correlation, no kernel flip (as in
build-cnn-01). Tests compare against a reference withatol=1e-9.
Notes
- Why padding. Without it, every conv shrinks the map by ; "same" padding lets you stack many conv layers and only downsample deliberately (via stride or pooling).
- Series. Builds on
build-cnn-01(naive conv);build-cnn-03/04add pooling,build-cnn-05the backward pass, andbuild-cnn-06composes a full tiny-classifier forward.
This problem ships 5 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.
- •With padding=0, stride=1: matches the no-pad reference
- •Diagnostic: padding=(k-1)//2, stride=1 preserves spatial size ('same' padding)
- •Stride 2 halves the spatial size (with appropriate padding)
- •Combined padding=1 and stride=2 matches reference
- •Padding actually pads with zeros (not edge-replicate, not reflect)