Mini-CNN forward (capstone)Medium

Mini-CNN forward (capstone)

Background

This is the capstone of the build-a-CNN series: wire the convolution, ReLU, and max-pooling primitives from the earlier problems into a complete forward pass for a tiny 2-class image classifier. Every piece is something you have already built — the task is to thread the shapes through the standard CNN pipeline (convolve, non-linearity, downsample, flatten, linear classifier, softmax). It is LeNet-5 in miniature.

Problem statement

Implement mini_cnn_forward(x, W_conv, W_fc), the forward pass that chains:

conv3×3    ReLU    maxpool2×2,  s=2    flatten    Wfc    softmax\text{conv}_{3\times3} \;\to\; \text{ReLU} \;\to\; \text{maxpool}_{2\times2,\; s=2} \;\to\; \text{flatten} \;\to\; \cdot\,W_{fc} \;\to\; \text{softmax}

The helpers conv2d_naive, relu, max_pool_2d, and softmax are provided in the starter — the body is essentially six lines wiring them in order. Return a 2-class probability distribution.

Input

  • xnp.ndarray of shape (1, 8, 8): a single grayscale image (any range).
  • W_convnp.ndarray of shape (C_out, 1, 3, 3): the convolution kernel.
  • W_fcnp.ndarray of shape (C_out * 3 * 3, 2): the fully-connected weights mapping the flattened features to 2 logits.

Output

Returns an np.ndarray of shape (2,) — a probability distribution over the 2 classes (non-negative, sums to 1).

Examples

Example 1 — shape walk through the pipeline (Cout=4C_\text{out}=4)

x: (1, 8, 8), W_conv: (4, 1, 3, 3), W_fc: (36, 2)
  conv2d_naive ->  (4, 6, 6)     # 8 - 3 + 1 = 6
  relu         ->  (4, 6, 6)     # shape unchanged
  max_pool 2,2 ->  (4, 3, 3)     # (6 - 2)//2 + 1 = 3
  flatten      ->  (36,)         # 4 * 3 * 3 = 36
  @ W_fc       ->  (2,)          # logits
  softmax      ->  (2,)          # probabilities, sum = 1

Explanation: the conv produces a 6×66\times6 map per filter, pooling halves it to 3×33\times3, and flattening gives 49=364\cdot 9 = 36 features — exactly W_fc's first dimension. The result is a length-2 distribution summing to 1.

Example 2 — swapping the classifier columns flips the prediction

out_a = mini_cnn_forward(x, W_conv, W_fc)
out_b = mini_cnn_forward(x, W_conv, W_fc[:, ::-1])   # swap the two fc columns
=> argmax(out_a) != argmax(out_b)        (unless the two logits are tied)

Explanation: the two columns of W_fc produce the two class logits. Swapping them swaps the logits, so the predicted class flips — a sanity check that the classification head actually depends on W_fc.

Constraints

  • Apply the stages in order: conv → ReLU → max-pool (kernel 2, stride 2) → flatten → linear (@ W_fc) → softmax.
  • Shapes must thread exactly: 8×86×68\times8 \to 6\times6 (3×3 conv, no pad) 3×3\to 3\times3 (2×2 pool, stride 2) Cout9\to C_\text{out}\cdot 9 features (2,)\to (2,).
  • The output is a valid probability distribution (use the provided numerically-stable softmax); it sums to 1 within atol=1e-7.
  • The function is deterministic — identical inputs give identical outputs.

Notes

  • LeNet-5 in miniature. Same building blocks (conv, non-linearity, pooling, a linear head) — real CNNs just stack more of them with more channels. After this you can read LeCun et al. (1998) and recognise every piece.
  • Series. This composes build-cnn-01 (conv), ReLU, and build-cnn-03 (max pool) with a softmax classifier — the finale of the build-cnn track.
Python
Loading...

This problem ships 5 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.

  • Output is a 2-element 1-D array
  • Output sums to 1 (valid probability distribution)
  • Determinism: same inputs -> same output
  • Diagnostic: produces the expected logits chain end-to-end
  • Argmax flips when fc weights are flipped (sanity that classification works)