Linear regression — normal equation
Background
Linear regression fits a straight-line (or hyperplane) model by choosing the coefficients that make the predictions as close as possible to the targets , measured by squared error. The normal equation solves that minimisation in closed form — a single matrix expression, with no learning rate and no iteration. It is the first model in almost every ML course and the foundation for ridge, lasso, and generalised linear models.
Problem statement
Implement linear_regression_normal_equation(X, y) that computes the ordinary-least-squares coefficients minimising the squared residual . The closed-form solution is:
Input
X—list[list[float]]of shape(n_samples, n_features): the design matrix, one row per sample. To fit an intercept, the caller puts a leading column of ones inX; the function does not add one.y—list[float]of lengthn_samples: the target vector.
Output
Returns list[float] of length n_features: the fitted coefficients in the column order of X, each rounded to 4 decimal places. When the first column of X is all ones, theta[0] is the intercept.
Examples
Example 1 — exact fit
Input: X = [[1, 1], [1, 2], [1, 3]], y = [1, 2, 3]
Output: [0.0, 1.0]
Explanation: column 0 is the bias (all ones) and column 1 is the feature . The points lie exactly on the line , so least squares recovers intercept and slope .
Example 2 — least-squares fit on noisy data
Input: X = [[1, 1], [1, 2], [1, 3], [1, 4]], y = [6, 5, 7, 10]
Output: [3.5, 1.4]
Explanation: no line passes through all four points, so the normal equation returns the squared-error minimiser. With a single feature it reduces to the familiar slope/intercept formulas:
Constraints
Xhas shape(n_samples, n_features)andyhas lengthn_samples, with at least as many samples as features () and invertible (features full column rank — not perfectly collinear).- Pure OLS: no regularisation, and do not prepend a bias column yourself.
- Return a flat
list[float]of lengthn_features, rounded withnp.round(theta, 4);-0.0is an accepted form of0.0. - Vectorise with numpy; the hidden tests compare against the 4-dp coefficients with
atol=1e-4.
Notes
- The normal equation is the exact minimiser of . Forming costs in the number of features — fine when is small, but numerically fragile when is near-singular (highly correlated features). That instability is exactly what ridge regression's term cures.
- Whether
theta[0]is an "intercept" depends entirely on whether the first column ofXis ones; the function itself is agnostic to that choice.
This problem ships 3 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.
- •Exact fit: y = 0 + 1*x recovers [0.0, 1.0]
- •Least-squares fit on non-collinear data -> [3.5, 1.4]
- •Returns a flat list of n_features floats