Temperature scaling
Background
Temperature is the simplest knob in next-token sampling: one scalar reshapes the entire output distribution between "fully deterministic" and "pure noise" — without retraining or changing the model at all. It is just a softmax with the logits divided by first, so it inherits softmax's overflow pitfall and the same max-shift fix.
Problem statement
Implement temperature_scale(logits, T): divide every logit by , then take a numerically-stable softmax:
Subtract the max after the division (before exp) for stability.
Input
logits— 1-Dnp.ndarrayof arbitrary real values.T—float: the temperature.
Output
Returns a 1-D np.ndarray of the same shape, summing to 1.
Examples
Example 1 — low temperature sharpens toward the argmax
Input: logits = [1.0, 2.0, 3.0, 0.5], T = 0.01
Output: ≈ [~0, ~0, ~1, ~0] (mass concentrates on index 2)
Explanation: dividing by multiplies every gap by 100, so softmax piles almost all the probability onto the largest logit — equivalent to greedy/argmax decoding.
Example 2 — high temperature flattens toward uniform
Input: logits = [1.0, 2.0, 3.0, 4.0], T = 1000
Output: ≈ [0.25, 0.25, 0.25, 0.25]
Explanation: dividing by shrinks all gaps toward 0, so the exponentials are nearly equal and the distribution flattens to uniform — maximally random.
Constraints
- Compute
softmax(logits / T); the output is a valid distribution (all positive, sums to 1). - Numerical stability: subtract
z.max()after dividing byT, beforeexp— large logits like[1000, 1001, 1002]otherwise overflow toinf. - Regimes: is plain softmax (unchanged); sharpens to a one-hot at the argmax; flattens toward uniform.
Notes
- Typical settings. for natural LLM output — higher gets creative/weird, lower gets stiff and repetitive. It is usually combined with a truncation step like top-k or top-p.
- It's just softmax. This reuses the exact max-shift trick from softmax from scratch; the only addition is the
/ Tbefore exponentiating.
This problem ships 5 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.
- •Output is a probability distribution
- •T = 1.0 is plain softmax
- •Diagnostic: low T sharpens toward one-hot on the argmax
- •High T flattens toward uniform
- •Numerically stable for large logits