Linear forward (Wx + b)Easy

Linear forward (Wx + b)

Background

The linear (a.k.a. dense, fully-connected, or PyTorch's nn.Linear) layer is the bedrock of every feed-forward network: it applies a learned affine transform — a matrix multiply plus a bias — to each input vector in a batch. It is the very first building block of this series; you will combine it with non-linearities and a backward pass over the next problems until you are training a 2-layer MLP on XOR.

Problem statement

Implement linear_forward(x, W, b): the forward pass of a linear layer,

y=xW+by = x\,W + b

with xRB×inx \in \mathbb{R}^{B\times \text{in}}, WRin×outW \in \mathbb{R}^{\text{in}\times\text{out}}, and bias bRoutb \in \mathbb{R}^{\text{out}} broadcast across the batch. The result yRB×outy \in \mathbb{R}^{B\times\text{out}} is the pre-activation output.

Input

  • xnp.ndarray of shape (B, in_features): a batch of input vectors.
  • Wnp.ndarray of shape (in_features, out_features): the weight matrix.
  • bnp.ndarray of shape (out_features,): the bias vector.

Output

Returns an np.ndarray of shape (B, out_features).

Examples

Example 1 — bias broadcasts across the batch

Input:  x = np.zeros((3, 4)), W = np.ones((4, 2)), b = [7.0, -2.0]
Output: [[ 7.0, -2.0],
         [ 7.0, -2.0],
         [ 7.0, -2.0]]

Explanation: with a zero input, x @ W is all zeros, so the output is just the bias repeated across all 3 rows — the (out,) bias broadcasts over the batch dimension with no reshaping.

Example 2 — identity weight returns the input

Input:  x = [[1.0, 2.0, 3.0]], W = np.eye(3), b = np.zeros(3)
Output: [[1.0, 2.0, 3.0]]

Explanation: xI+0=xx\,I + 0 = x, so an identity weight matrix with zero bias passes the input straight through.

Constraints

  • Use matrix multiply @ (not elementwise *), then add the bias.
  • The bias is shape (out_features,) and broadcasts across the batch — no reshape needed.
  • Shapes thread as (B, in) · (in, out) → (B, out); the batch dimension is preserved for any B.
  • Tests compare against x @ W + b with atol=1e-7.

Notes

  • Convention. This uses the row-vector convention y=xW+by = xW + b (inputs are rows, W is (in, out)) — the same layout as nn.Linear operating on a (batch, features) tensor.
  • Series. This is step 1 of build-nn; build-nn-05-linear-backward derives the gradients for this same layer once the forward pass is solid.
Python
Loading...

This problem ships 5 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.

  • Output shape is (B, out_features)
  • Matches x @ W + b reference
  • Bias broadcasts across the batch dimension
  • Single example (B=1) works
  • Larger batch shape is preserved