Logistic regression — gradient descent
Background
Logistic regression is the workhorse binary classifier: it passes a linear score through the sigmoid to produce a probability, then is trained by minimising binary cross-entropy (log loss). Unlike linear regression's normal equation there is no closed form — the parameters are found by gradient descent. It underpins everything from click-through prediction to the final layer of a neural-network classifier.
Problem statement
Implement train_logreg(X, y, learning_rate, iterations) that trains logistic regression by gradient descent on the binary cross-entropy loss and returns the learned coefficients together with the loss recorded at every iteration. Prepend a bias column of ones to , start from , and at each step use the sigmoid prediction and the BCE gradient:
Input
X—np.ndarrayof shape(n_samples, n_features): the feature matrix (no bias column — the function prepends one).y—np.ndarrayof shape(n_samples,): binary labels in .learning_rate—float: the gradient-descent step size .iterations—int: the number of full-batch gradient steps.
Output
Returns a tuple (coefficients, losses):
coefficients—list[float]of lengthn_features + 1(bias first), each rounded to 4 decimals.losses—list[float]of lengthiterations: the total BCE loss after each step, rounded to 4 decimals.
Examples
Example 1
Input: X = [[1.0, 0.5], [-0.5, -1.5], [2.0, 1.5], [-2.0, -1.0]]
y = [1, 0, 1, 0], learning_rate = 0.01, iterations = 20
Output: coefficients = [-0.0003, 0.4038, 0.3379]
losses[0] = 2.7726, falling monotonically over the 20 steps
Explanation: with every prediction is , so the first loss is . Each gradient step raises on the positives and lowers it on the negatives, so the bias stays near while the two feature weights grow to separate the classes and the loss falls monotonically.
Constraints
- Prepend a single bias column of ones to
X; initialise to zeros. - Use the summed (not averaged) BCE loss, with gradient .
- Round the coefficients and every loss value to 4 decimal places.
- The
learning_rateis small enough that the loss is non-increasing; tests compare withatol=1e-3.
Notes
- The BCE gradient simplifies beautifully: despite the in the loss, — the same clean form as linear regression's gradient, which is why logistic regression trains so stably.
- Numerical care: diverges if reaches exactly or . With zero init and few iterations this won't happen, but production code clips into .
This problem ships 4 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.
- •Reproduces the reference run: coefficients and first loss
- •First loss equals N*ln2 from zero init (every prediction is 0.5)
- •BCE loss is monotonically non-increasing under gradient descent
- •Returns a bias term plus one weight per feature