Elastic-Net regression (gradient descent)
Background
Elastic Net is linear regression with both an L1 (Lasso) and an L2 (Ridge) penalty on the weights. L1 drives some coefficients exactly to zero (feature selection); L2 shrinks them smoothly and stabilises the fit when features are correlated. Elastic Net blends the two, which makes it the default when you have many, possibly-correlated features. Because the L1 term is non-differentiable at zero there is no closed form, so it is trained by gradient descent (using the L1 subgradient).
Problem statement
Implement elastic_net_gradient_descent(X, y, alpha1, alpha2, learning_rate, max_iter, tol) that fits weights and bias by gradient descent on the elastic-net objective. Initialise , and repeat:
Stop early once .
Input
X—np.ndarrayof shape(n_samples, n_features): feature matrix (bias is a separate scalar, not a column).y—np.ndarrayof shape(n_samples,): targets.alpha1—float: L1 (Lasso) strength.alpha2—float: L2 (Ridge) strength.learning_rate—float: step size .max_iter—int: maximum number of gradient steps.tol—float: stop when the L1 norm of the weight gradient falls below this.
Output
Returns a tuple (weights, bias):
weights—np.ndarrayof shape(n_features,).bias—float.
Examples
Example 1
Input: X = [[0, 0], [1, 1], [2, 2]], y = [0, 1, 2]
alpha1 = 0.1, alpha2 = 0.1, learning_rate = 0.01, max_iter = 1000, tol = 1e-4
Output: weights = [0.3732, 0.3732], bias = 0.2479
Explanation: the data follows exactly, but the L1 + L2 penalties shrink each weight well below and push the leftover signal into the bias — trading a little fit for smaller, more stable coefficients.
Constraints
- Initialise
weightsto zeros andbiasto0. - Gradient = data term + L1 subgradient + L2 term .
- Stop early if ; otherwise run
max_itersteps. - Tests compare with
atol=1e-2.
Notes
- The L1 subgradient
np.sign(w)is at exactly , so a weight that reaches zero feels no further L1 push and can stay sparse — this is what gives Lasso/Elastic-Net their feature-selection behaviour. - Pure Lasso is ; pure Ridge is ; Elastic Net interpolates. The factor on the L2 term is just the derivative of .
This problem ships 4 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.
- •Reproduces the reference fit on y = x data
- •Stronger L1 penalty shrinks the weights
- •Returns a weights array of shape (n_features,) and a scalar bias
- •Without regularization, more iterations reduce training MSE