Linear regression — gradient descentEasy

Linear regression — gradient descent

Background

Linear regression fits a model y^=Xθ\hat{y} = X\theta by minimising the mean squared error between predictions and targets. Where the normal equation solves this in closed form, gradient descent reaches the same optimum iteratively — the approach that scales to millions of features and is the literal inner loop of training every neural network. The squared-error surface is convex, so with a sensible step size gradient descent is guaranteed to converge to the global minimum.

Problem statement

Implement linear_regression_gradient_descent(X, y, alpha, iterations) that fits the coefficients θ\theta by batch gradient descent on the mean squared error. Initialise θ=0\theta = 0 and repeat the update iterations times:

y^=Xθ,e=y^y\hat{y} = X\theta, \qquad e = \hat{y} - y θθαmXe\theta \leftarrow \theta - \frac{\alpha}{m}\, X^\top e

where mm is the number of training examples and α\alpha is the learning rate.

Input

  • Xnp.ndarray of shape (m, n): the design matrix, with a leading column of ones for the intercept already included.
  • ynp.ndarray of shape (m,): the target values.
  • alphafloat: the learning rate.
  • iterationsint: the number of full-batch gradient-descent steps.

Output

Returns an np.ndarray of shape (n,): the fitted coefficients θ\theta, rounded to 4 decimal places (theta[0] is the intercept).

Examples

Example 1 — clean data converges to the exact line

Input:  X = [[1, 1], [1, 2], [1, 3]], y = [1, 2, 3], alpha = 0.1, iterations = 1000
Output: [0.0, 1.0]

Explanation: the points lie exactly on y=xy = x, so gradient descent drives θ\theta toward an intercept of 00 and a slope of 11.

Example 2 — noisy data converges to the least-squares solution

Input:  X = [[1, 1], [1, 2], [1, 3], [1, 4]], y = [6, 5, 7, 10], alpha = 0.05, iterations = 5000
Output: [3.5, 1.4]

Explanation: no line fits all four points, so gradient descent converges to the same minimiser the normal equation gives, [3.5,1.4][3.5, 1.4].

Constraints

  • X already contains the intercept column; initialise θ\theta to zeros.
  • Use the mean gradient 1mXe\frac{1}{m}X^\top e (averaged over the mm examples), scaled by alpha.
  • Return an np.ndarray of shape (n,) rounded with np.round(theta, 4).
  • Choose alpha small enough to converge; tests compare with atol=1e-3.

Notes

  • The MSE surface is convex, so the only fixed point is the global optimum — given enough iterations and a small enough α\alpha, gradient descent reaches exactly what the normal equation computes in one step.
  • The trade-off: gradient descent costs O(mn)O(mn) per step but avoids the O(n3)O(n^3) matrix inverse, which is why it (and its stochastic/mini-batch variants) wins on large, high-dimensional data.
Python
Loading...

This problem ships 4 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.

  • Clean data y = x converges to intercept 0, slope 1
  • Noisy data converges to the normal-equation (OLS) solution [3.5, 1.4]
  • Returns a 1-D array with one coefficient per column of X
  • More iterations reduce the training MSE