Linear regression — normal equationMedium

Linear regression — normal equation

Background

Linear regression fits a straight-line (or hyperplane) model y^=Xθ\hat{y} = X\theta by choosing the coefficients θ\theta that make the predictions as close as possible to the targets yy, measured by squared error. The normal equation solves that minimisation in closed form — a single matrix expression, with no learning rate and no iteration. It is the first model in almost every ML course and the foundation for ridge, lasso, and generalised linear models.

Problem statement

Implement linear_regression_normal_equation(X, y) that computes the ordinary-least-squares coefficients θ\theta minimising the squared residual Xθy2\lVert X\theta - y \rVert^2. The closed-form solution is:

θ=(XX)1Xy\theta = \left(X^\top X\right)^{-1} X^\top y

Input

  • Xlist[list[float]] of shape (n_samples, n_features): the design matrix, one row per sample. To fit an intercept, the caller puts a leading column of ones in X; the function does not add one.
  • ylist[float] of length n_samples: the target vector.

Output

Returns list[float] of length n_features: the fitted coefficients θ\theta in the column order of X, each rounded to 4 decimal places. When the first column of X is all ones, theta[0] is the intercept.

Examples

Example 1 — exact fit

Input:  X = [[1, 1], [1, 2], [1, 3]], y = [1, 2, 3]
Output: [0.0, 1.0]

Explanation: column 0 is the bias (all ones) and column 1 is the feature xx. The points lie exactly on the line y=0+1xy = 0 + 1\cdot x, so least squares recovers intercept 00 and slope 11.

Example 2 — least-squares fit on noisy data

Input:  X = [[1, 1], [1, 2], [1, 3], [1, 4]], y = [6, 5, 7, 10]
Output: [3.5, 1.4]

Explanation: no line passes through all four points, so the normal equation returns the squared-error minimiser. With a single feature it reduces to the familiar slope/intercept formulas:

slope=i(xixˉ)(yiyˉ)i(xixˉ)2=7.05.0=1.4\text{slope} = \frac{\sum_i (x_i - \bar{x})(y_i - \bar{y})}{\sum_i (x_i - \bar{x})^2} = \frac{7.0}{5.0} = 1.4 intercept=yˉslopexˉ=71.4×2.5=3.5\text{intercept} = \bar{y} - \text{slope}\cdot\bar{x} = 7 - 1.4 \times 2.5 = 3.5

Constraints

  • X has shape (n_samples, n_features) and y has length n_samples, with at least as many samples as features (ndn \ge d) and XXX^\top X invertible (features full column rank — not perfectly collinear).
  • Pure OLS: no regularisation, and do not prepend a bias column yourself.
  • Return a flat list[float] of length n_features, rounded with np.round(theta, 4); -0.0 is an accepted form of 0.0.
  • Vectorise with numpy; the hidden tests compare against the 4-dp coefficients with atol=1e-4.

Notes

  • The normal equation is the exact minimiser of Xθy2\lVert X\theta - y \rVert^2. Forming (XX)1\left(X^\top X\right)^{-1} costs O(d3)O(d^3) in the number of features dd — fine when dd is small, but numerically fragile when XXX^\top X is near-singular (highly correlated features). That instability is exactly what ridge regression's +λI+\lambda I term cures.
  • Whether theta[0] is an "intercept" depends entirely on whether the first column of X is ones; the function itself is agnostic to that choice.
Python
Loading...

This problem ships 3 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.

  • Exact fit: y = 0 + 1*x recovers [0.0, 1.0]
  • Least-squares fit on non-collinear data -> [3.5, 1.4]
  • Returns a flat list of n_features floats