Average pooling 2D forwardEasy

Average pooling 2D forward

Background

Average pooling is the gentler twin of max pooling: same window-and-stride sweep, same output-shape formula, but each window is reduced with its mean instead of its max. The two have different inductive biases — max pooling keeps the single strongest activation (sparse gradients, sharp feature detection), while average pooling blends the whole window (dense gradients, like a fixed low-pass filter). In modern architectures average pooling shows up most often as a global pool at the very end — collapsing a (C, H, W) map into a (C,) vector before the final linear classifier (ResNet, MobileNet, and Vision Transformers all do this).

Problem statement

Implement avg_pool_2d(x, kernel_size, stride): for each channel cc and output position (i,j)(i, j), take the mean over the k × k window:

out[c,i,j]=1k2u=0k1  v=0k1x[c,  is+u,  js+v]\text{out}[c, i, j] = \frac{1}{k^2} \sum_{u=0}^{k-1}\;\sum_{v=0}^{k-1} x\big[c,\; i\,s + u,\; j\,s + v\big]

with output spatial size (no padding, no dilation):

outH=Hks+1,outW=Wks+1\text{out}_H = \left\lfloor \frac{H - k}{s} \right\rfloor + 1, \qquad \text{out}_W = \left\lfloor \frac{W - k}{s} \right\rfloor + 1

Channels are pooled independently — the output keeps the same CC.

Input

  • xnp.ndarray of shape (C, H, W): the input feature map.
  • kernel_sizeint: the spatial size kk of the pooling window.
  • strideint: the step ss between windows.

Output

Returns an np.ndarray of shape (C, out_H, out_W) using the floor formula above.

Examples

Example 1 — classic 2×22\times2 average pooling

Input:  x = [[[ 1,  2,  3,  4],
              [ 5,  6,  7,  8],
              [ 9, 10, 11, 12],
              [13, 14, 15, 16]]]        # shape (1, 4, 4)
        kernel_size=2, stride=2
Output: [[[ 3.5,  5.5],
          [11.5, 13.5]]]                # shape (1, 2, 2)

Explanation: each output cell averages a non-overlapping 2×22\times2 block — top-left (1+2+5+6)/4=3.5(1+2+5+6)/4 = 3.5, top-right (3+4+7+8)/4=5.5(3+4+7+8)/4 = 5.5, and so on.

Example 2 — the divisor is k2k^2

Input:  x = np.ones((1, 4, 4)), kernel_size=2, stride=2
Output: shape (1, 2, 2), every cell = 1.0

Explanation: each 2×22\times2 window of ones sums to 4 and is divided by k2=4k^2 = 4, giving 1.01.0. Dividing by anything else (e.g. HWH\cdot W or s2s^2) would give the wrong answer here.

Constraints

  • Pool per channel — no mixing across the channel axis; CC is unchanged.
  • Divide each window sum by k2k^2 (the window size), not by HWH\cdot W or s2s^2.
  • Output sizing uses floor division: (Hk)/s+1\lfloor (H-k)/s \rfloor + 1.
  • The window for output (i, j) starts at (i*stride, j*stride).
  • kernel_size == stride == 1 is the identity; a constant input yields that same constant everywhere.

Notes

  • Smoothing, not routing. Because every element in a window contributes equally, average pooling spreads gradient evenly across the window on the backward pass — the opposite of max pooling's single-winner routing.
  • Series. This shares the exact loop structure of build-cnn-03 (max pool) with .max() swapped for .mean(); build-cnn-05 covers the conv backward pass and build-cnn-06 the full tiny-classifier forward.
Python
Loading...

This problem ships 6 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.

  • Output shape: (C, out_H, out_W) per the standard formula
  • Diagnostic: 4x4 input, k=2 stride=2 produces correct 2x2 means
  • Constant input: output is the same constant everywhere
  • kernel=stride=1 is the identity
  • Avg pool divides by k*k (not by something else)
  • Pooling is per-channel — no mixing across the C axis