Repetition penalty (HuggingFace-style)Medium

Repetition penalty (HuggingFace-style)

Background

The HuggingFace-style repetition penalty (Keskar et al., 2019, CTRL) is the standard fix when a language model gets stuck looping ("the the the…") and you don't want to disable sampling entirely. It down-weights the logits of tokens that have already appeared, so they're less likely to be chosen again. The twist is an asymmetric rule — divide if the logit is positive, multiply if negative — so the penalty always pushes the logit toward zero or below, whatever its sign.

Problem statement

Implement apply_repetition_penalty(logits, prefix_ids, penalty), mutating logits in place. For each token id tt that appears in prefix_ids:

logits[t]{logits[t]/penaltyif logits[t]>0logits[t]penaltyif logits[t]<0logits[t]if logits[t]=0\text{logits}[t] \leftarrow \begin{cases} \text{logits}[t] / \text{penalty} & \text{if } \text{logits}[t] > 0 \\ \text{logits}[t] \cdot \text{penalty} & \text{if } \text{logits}[t] < 0 \\ \text{logits}[t] & \text{if } \text{logits}[t] = 0 \end{cases}

Apply it once per unique token (a token appearing five times is penalised once), and return the same (mutated) array.

Input

  • logits — 1-D np.ndarray of length V: pre-softmax scores. Mutated in place.
  • prefix_idslist[int] or 1-D array: the token ids already emitted.
  • penaltyfloat 1.0\ge 1.0: the penalty strength (1.0 = no-op).

Output

Returns the mutated logits array (the same reference).

Examples

Example 1 — the asymmetric rule in both directions

Input:  logits = [2.0, -1.5, -3.0, 5.0], prefix_ids = [1, 2], penalty = 2.0
Output: [2.0, -3.0, -6.0, 5.0]

Explanation: index 1 (1.5<0-1.5 < 0) is multiplied3.0-3.0; index 2 (3.0<0-3.0 < 0) → 6.0-6.0. Indices 0 and 3 aren't in the prefix, so they're untouched. Both penalised logits move further below zero — less likely after softmax.

Example 2 — penalty applied once per unique id

Input:  logits = [4.0, 0.0];  prefix_ids = [0]  vs  [0, 0, 0, 0];  penalty = 2.0
Output: both -> [2.0, 0.0]

Explanation: token 0 (a positive logit) is divided once: 4/2=24/2 = 2. Repeating it in the prefix doesn't compound the penalty — set(prefix_ids) deduplicates, so it's not 4/244/2^4.

Constraints

  • Sign-aware: positive logits are divided, negative multiplied, zeros left alone — both branches push the logit downward.
  • Apply once per unique token id (deduplicate prefix_ids).
  • Mutate in place (/=, *=) and return the same array reference; penalty = 1.0 leaves logits unchanged.
  • After the penalty, a seen token is strictly less likely under softmax than without it.

Notes

  • Why asymmetric. Naively dividing a negative logit would make it less negative — i.e. more likely, the opposite of the goal. Branching on the sign guarantees the penalty always reduces a seen token's probability.
  • Tuning & context. Typical penalty is 1.1–1.3; higher is too aggressive, lower barely moves the distribution. It's applied to the logits before a sampling step like top-k / top-p.
Python
Loading...

This problem ships 6 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.

  • Returns the same array reference (in-place mutation)
  • penalty=1.0 is a no-op
  • Diagnostic: positive logit at a seen token gets DIVIDED by penalty
  • Diagnostic: negative logit at a seen token gets MULTIPLIED by penalty
  • Repeated token ids in prefix do not double-apply the penalty
  • Penalty makes seen tokens strictly less likely after softmax (vs no penalty)