MSE loss + backwardEasy

MSE loss + backward

Background

Mean squared error is the regression workhorse — used wherever you predict continuous values: pixel intensities in autoencoders, value functions in RL, denoising targets in diffusion models. It is the average of the squared differences between predictions and targets. Implementing the forward and backward together shows how a scalar loss seeds the entire backward pass: its gradient w.r.t. the predictions is the very first grad_out that flows back into the network.

Problem statement

Implement two functions, where NN is the total number of elements:

forward:L=1Ni(piti)2,backward:Lpi=2(piti)N\text{forward:}\quad L = \frac{1}{N}\sum_{i} (p_i - t_i)^2, \qquad\qquad \text{backward:}\quad \frac{\partial L}{\partial p_i} = \frac{2\,(p_i - t_i)}{N}

mse_loss(pred, target) returns the scalar LL. mse_loss_backward(pred, target) returns the gradient w.r.t. pred, the same shape as pred.

Input

  • prednp.ndarray of any shape: the model predictions.
  • targetnp.ndarray of the same shape: the ground-truth values.

Output

  • mse_loss returns a scalar float.
  • mse_loss_backward returns an np.ndarray of the same shape as pred.

Examples

Example 1 — forward on a vector

Input:  pred = [1.0, 2.0, 3.0, 4.0], target = [1.5, 2.5, 2.5, 4.0]
Output: 0.1875

Explanation: the squared diffs are [0.52,0.52,0.52,02]=[0.25,0.25,0.25,0][0.5^2, 0.5^2, 0.5^2, 0^2] = [0.25, 0.25, 0.25, 0]; their mean over all 4 elements is 0.75/4=0.18750.75/4 = 0.1875.

Example 2 — 2-D input averages over every element

Input:  pred = [[1, 2], [3, 4]], target = [[2, 2], [3, 5]]
Output: mse_loss = 0.5
        mse_loss_backward = 2*(pred - target)/N, N = 4
                          = [[-0.5, 0.0], [0.0, -0.5]]

Explanation: squared diffs are [[1,0],[0,1]]; the mean over all N=4N=4 elements is 0.50.5. The gradient divides by the same N=4N=4: e.g. 2(12)/4=0.52\cdot(1-2)/4 = -0.5.

Constraints

  • NN is the total element count (pred.size), not len(pred) — these differ for 2-D inputs (batch × features). The loss averages over all elements.
  • The backward divides by the same NN as the forward, so analytic and finite-difference gradients agree (atol≈1e-4).
  • mse_loss_backward returns an array shaped like pred; the loss is exactly 0 when pred == target.

Notes

  • Why pred.size. Averaging over all elements matches PyTorch's MSELoss(reduction='mean') default and keeps the gradient magnitude independent of batch size. Dividing by len(pred) (the leading-axis size) is the classic bug the finite-difference test catches.
  • Series. Step 4 of build-nn; this loss closes the loop in the XOR-training capstone, where its gradient is backpropagated through the sigmoid and linear layers.
Python
Loading...

This problem ships 6 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.

  • mse_loss: returns a scalar
  • mse_loss: zero when pred == target
  • mse_loss: matches the formula
  • mse_loss: works on 2-D arrays (averages over ALL elements, not just rows)
  • mse_loss_backward: matches 2 * (pred - target) / N
  • mse_loss_backward: matches finite-difference numerical gradient