MSE loss + backward
Background
Mean squared error is the regression workhorse — used wherever you predict continuous values: pixel intensities in autoencoders, value functions in RL, denoising targets in diffusion models. It is the average of the squared differences between predictions and targets. Implementing the forward and backward together shows how a scalar loss seeds the entire backward pass: its gradient w.r.t. the predictions is the very first grad_out that flows back into the network.
Problem statement
Implement two functions, where is the total number of elements:
mse_loss(pred, target) returns the scalar . mse_loss_backward(pred, target) returns the gradient w.r.t. pred, the same shape as pred.
Input
pred—np.ndarrayof any shape: the model predictions.target—np.ndarrayof the same shape: the ground-truth values.
Output
mse_lossreturns a scalar float.mse_loss_backwardreturns annp.ndarrayof the same shape aspred.
Examples
Example 1 — forward on a vector
Input: pred = [1.0, 2.0, 3.0, 4.0], target = [1.5, 2.5, 2.5, 4.0]
Output: 0.1875
Explanation: the squared diffs are ; their mean over all 4 elements is .
Example 2 — 2-D input averages over every element
Input: pred = [[1, 2], [3, 4]], target = [[2, 2], [3, 5]]
Output: mse_loss = 0.5
mse_loss_backward = 2*(pred - target)/N, N = 4
= [[-0.5, 0.0], [0.0, -0.5]]
Explanation: squared diffs are [[1,0],[0,1]]; the mean over all elements is . The gradient divides by the same : e.g. .
Constraints
- is the total element count (
pred.size), notlen(pred)— these differ for 2-D inputs (batch × features). The loss averages over all elements. - The backward divides by the same as the forward, so analytic and finite-difference gradients agree (
atol≈1e-4). mse_loss_backwardreturns an array shaped likepred; the loss is exactly 0 whenpred == target.
Notes
- Why
pred.size. Averaging over all elements matches PyTorch'sMSELoss(reduction='mean')default and keeps the gradient magnitude independent of batch size. Dividing bylen(pred)(the leading-axis size) is the classic bug the finite-difference test catches. - Series. Step 4 of build-nn; this loss closes the loop in the XOR-training capstone, where its gradient is backpropagated through the sigmoid and linear layers.
This problem ships 6 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.
- •mse_loss: returns a scalar
- •mse_loss: zero when pred == target
- •mse_loss: matches the formula
- •mse_loss: works on 2-D arrays (averages over ALL elements, not just rows)
- •mse_loss_backward: matches 2 * (pred - target) / N
- •mse_loss_backward: matches finite-difference numerical gradient