Temperature scalingEasy

Temperature scaling

Background

Temperature is the simplest knob in next-token sampling: one scalar T>0T > 0 reshapes the entire output distribution between "fully deterministic" and "pure noise" — without retraining or changing the model at all. It is just a softmax with the logits divided by TT first, so it inherits softmax's overflow pitfall and the same max-shift fix.

Problem statement

Implement temperature_scale(logits, T): divide every logit by TT, then take a numerically-stable softmax:

pi=ezi/Tjezj/Tp'_i = \frac{e^{z_i / T}}{\sum_j e^{z_j / T}}

Subtract the max after the division (before exp) for stability.

Input

  • logits — 1-D np.ndarray of arbitrary real values.
  • Tfloat >0> 0: the temperature.

Output

Returns a 1-D np.ndarray of the same shape, summing to 1.

Examples

Example 1 — low temperature sharpens toward the argmax

Input:  logits = [1.0, 2.0, 3.0, 0.5], T = 0.01
Output: ≈ [~0, ~0, ~1, ~0]   (mass concentrates on index 2)

Explanation: dividing by T=0.01T=0.01 multiplies every gap by 100, so softmax piles almost all the probability onto the largest logit — equivalent to greedy/argmax decoding.

Example 2 — high temperature flattens toward uniform

Input:  logits = [1.0, 2.0, 3.0, 4.0], T = 1000
Output: ≈ [0.25, 0.25, 0.25, 0.25]

Explanation: dividing by T=1000T=1000 shrinks all gaps toward 0, so the exponentials are nearly equal and the distribution flattens to uniform — maximally random.

Constraints

  • Compute softmax(logits / T); the output is a valid distribution (all positive, sums to 1).
  • Numerical stability: subtract z.max() after dividing by T, before exp — large logits like [1000, 1001, 1002] otherwise overflow to inf.
  • Regimes: T=1T = 1 is plain softmax (unchanged); T0+T \to 0^+ sharpens to a one-hot at the argmax; TT \to \infty flattens toward uniform.

Notes

  • Typical settings. T[0.7,1.0]T \in [0.7, 1.0] for natural LLM output — higher gets creative/weird, lower gets stiff and repetitive. It is usually combined with a truncation step like top-k or top-p.
  • It's just softmax. This reuses the exact max-shift trick from softmax from scratch; the only addition is the / T before exponentiating.
Python
Loading...

This problem ships 5 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.

  • Output is a probability distribution
  • T = 1.0 is plain softmax
  • Diagnostic: low T sharpens toward one-hot on the argmax
  • High T flattens toward uniform
  • Numerically stable for large logits