Perplexity from log-probsEasy

Perplexity from log-probs

Background

Perplexity is the standard intrinsic metric for language models: the exponentiated average negative log-likelihood per token. Intuitively it's the model's "branching factor" — the effective number of equally-likely next tokens it is choosing among. Lower is better; a perplexity of VV means the model is as confused as a uniform guess over VV options.

Problem statement

Implement perplexity(log_probs) from the per-token natural-log probabilities the model assigned to the actual next tokens:

PP=exp ⁣(1Ni=1Nlogp(xi))\text{PP} = \exp\!\Big(-\frac{1}{N}\sum_{i=1}^{N}\log p(x_i)\Big)

Input

  • log_probs — array-like of float: the natural log of the probability the model gave to each ground-truth token.

Output

Returns a float: the perplexity (1\ge 1).

Examples

Example 1

Input:  log_probs = [-0.6931, -0.6931, -0.6931]   # each token assigned prob 0.5
Output: 2.0

Explanation: the mean negative log-likelihood is 0.6931=ln20.6931 = \ln 2, so PP=e0.6931=2\text{PP} = e^{0.6931} = 2 — the model is as uncertain as a fair coin at each step.

Constraints

  • Inputs are natural logs; perplexity is exp(mean log-prob)\exp(-\text{mean log-prob}).
  • A confident, correct model (log-probs near 0) has perplexity near 1.
  • Tests compare with atol=1e-4.

Notes

  • Perplexity is simply an exponentiated cross-entropy: eHe^{H} for HH in nats (or 2H2^{H} for bits).
  • Because it is a per-token geometric mean, a single near-zero-probability token (a very negative log-prob) can blow perplexity up dramatically.
Python
Loading...

This problem ships 4 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.

  • Reference example: prob 0.5 each -> perplexity 2
  • Uniform over V tokens gives perplexity V
  • A confident, correct model has perplexity near 1
  • Equals exp of the mean negative log-likelihood