Softmax (multinomial) regression
Background
Softmax regression (multinomial logistic regression) generalises logistic regression from 2 classes to . It scores every class with a linear function, the softmax turns those scores into a probability distribution over the classes, and training minimises the multi-class cross-entropy by gradient descent. This is exactly the final linear layer plus loss of almost every neural-network classifier.
Problem statement
Implement train_softmaxreg(X, y, learning_rate, iterations) that trains softmax regression by gradient descent and returns the learned parameters together with the cross-entropy loss at each step. One-hot encode the integer labels into , prepend a bias column to , start from , and at each iteration:
Input
X—np.ndarrayof shape(N, M):Nsamples withMfeatures (no bias column — the function prepends one).y—np.ndarrayof shape(N,): integer class labels in (classes start at 0).learning_rate—float: the step size .iterations—int: the number of full-batch gradient steps.
Output
Returns (B, losses):
B—list[list[float]]of shape(C, M+1): the parameter matrix transposed (one row per class, bias first), rounded to 4 decimals.losses—list[float]of lengthiterations: the summed cross-entropy after each step, rounded to 4 decimals.
Examples
Example 1
Input: X = [[0.5, -1.2], [-0.3, 1.1], [0.8, -0.6]], y = [0, 1, 2]
learning_rate = 0.01, iterations = 10
Output: B = [[-0.0011, 0.0145, -0.0921],
[ 0.0020, -0.0598, 0.1263],
[-0.0009, 0.0453, -0.0342]]
losses[0] = 3.2958, losses[-1] = 3.0110 (10 values, decreasing)
Explanation: with every class probability is , so the first loss is . Each step shifts probability mass toward the correct class, so the loss falls.
Constraints
- One-hot encode
ywithC = y.max() + 1(classes start at 0); prepend a ones column toX; initialiseB = 0. - Take the softmax per row (over classes); use the summed cross-entropy and the gradient .
- Return
B.T(shape(C, M+1)) and all losses rounded to 4 decimals. - Tests compare with
atol=1e-3.
Notes
- The softmax-cross-entropy gradient collapses to the same clean residual form as linear and logistic regression — the recurring identity that makes these models train stably.
- Numerical stability: a production softmax subtracts before exponentiating to avoid overflow; with zero init and few steps the raw version is safe.
This problem ships 4 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.
- •Reproduces the reference: B matrix and loss trajectory
- •First loss equals N*ln(C) from uniform init
- •Cross-entropy decreases monotonically
- •B has shape (C, M+1): one row per class, bias included