Average pooling 2D forward
Background
Average pooling is the gentler twin of max pooling: same window-and-stride sweep, same output-shape formula, but each window is reduced with its mean instead of its max. The two have different inductive biases — max pooling keeps the single strongest activation (sparse gradients, sharp feature detection), while average pooling blends the whole window (dense gradients, like a fixed low-pass filter). In modern architectures average pooling shows up most often as a global pool at the very end — collapsing a (C, H, W) map into a (C,) vector before the final linear classifier (ResNet, MobileNet, and Vision Transformers all do this).
Problem statement
Implement avg_pool_2d(x, kernel_size, stride): for each channel and output position , take the mean over the k × k window:
with output spatial size (no padding, no dilation):
Channels are pooled independently — the output keeps the same .
Input
x—np.ndarrayof shape(C, H, W): the input feature map.kernel_size—int: the spatial size of the pooling window.stride—int: the step between windows.
Output
Returns an np.ndarray of shape (C, out_H, out_W) using the floor formula above.
Examples
Example 1 — classic average pooling
Input: x = [[[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12],
[13, 14, 15, 16]]] # shape (1, 4, 4)
kernel_size=2, stride=2
Output: [[[ 3.5, 5.5],
[11.5, 13.5]]] # shape (1, 2, 2)
Explanation: each output cell averages a non-overlapping block — top-left , top-right , and so on.
Example 2 — the divisor is
Input: x = np.ones((1, 4, 4)), kernel_size=2, stride=2
Output: shape (1, 2, 2), every cell = 1.0
Explanation: each window of ones sums to 4 and is divided by , giving . Dividing by anything else (e.g. or ) would give the wrong answer here.
Constraints
- Pool per channel — no mixing across the channel axis; is unchanged.
- Divide each window sum by (the window size), not by or .
- Output sizing uses floor division: .
- The window for output
(i, j)starts at(i*stride, j*stride). kernel_size == stride == 1is the identity; a constant input yields that same constant everywhere.
Notes
- Smoothing, not routing. Because every element in a window contributes equally, average pooling spreads gradient evenly across the window on the backward pass — the opposite of max pooling's single-winner routing.
- Series. This shares the exact loop structure of
build-cnn-03(max pool) with.max()swapped for.mean();build-cnn-05covers the conv backward pass andbuild-cnn-06the full tiny-classifier forward.
This problem ships 6 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.
- •Output shape: (C, out_H, out_W) per the standard formula
- •Diagnostic: 4x4 input, k=2 stride=2 produces correct 2x2 means
- •Constant input: output is the same constant everywhere
- •kernel=stride=1 is the identity
- •Avg pool divides by k*k (not by something else)
- •Pooling is per-channel — no mixing across the C axis