Perplexity from log-probs
Background
Perplexity is the standard intrinsic metric for language models: the exponentiated average negative log-likelihood per token. Intuitively it's the model's "branching factor" — the effective number of equally-likely next tokens it is choosing among. Lower is better; a perplexity of means the model is as confused as a uniform guess over options.
Problem statement
Implement perplexity(log_probs) from the per-token natural-log probabilities the model assigned to the actual next tokens:
Input
log_probs— array-like offloat: the natural log of the probability the model gave to each ground-truth token.
Output
Returns a float: the perplexity ().
Examples
Example 1
Input: log_probs = [-0.6931, -0.6931, -0.6931] # each token assigned prob 0.5
Output: 2.0
Explanation: the mean negative log-likelihood is , so — the model is as uncertain as a fair coin at each step.
Constraints
- Inputs are natural logs; perplexity is .
- A confident, correct model (log-probs near 0) has perplexity near 1.
- Tests compare with
atol=1e-4.
Notes
- Perplexity is simply an exponentiated cross-entropy: for in nats (or for bits).
- Because it is a per-token geometric mean, a single near-zero-probability token (a very negative log-prob) can blow perplexity up dramatically.
This problem ships 4 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.
- •Reference example: prob 0.5 each -> perplexity 2
- •Uniform over V tokens gives perplexity V
- •A confident, correct model has perplexity near 1
- •Equals exp of the mean negative log-likelihood