Repetition penalty (HuggingFace-style)
Background
The HuggingFace-style repetition penalty (Keskar et al., 2019, CTRL) is the standard fix when a language model gets stuck looping ("the the the…") and you don't want to disable sampling entirely. It down-weights the logits of tokens that have already appeared, so they're less likely to be chosen again. The twist is an asymmetric rule — divide if the logit is positive, multiply if negative — so the penalty always pushes the logit toward zero or below, whatever its sign.
Problem statement
Implement apply_repetition_penalty(logits, prefix_ids, penalty), mutating logits in place. For each token id that appears in prefix_ids:
Apply it once per unique token (a token appearing five times is penalised once), and return the same (mutated) array.
Input
logits— 1-Dnp.ndarrayof lengthV: pre-softmax scores. Mutated in place.prefix_ids—list[int]or 1-D array: the token ids already emitted.penalty—float: the penalty strength (1.0= no-op).
Output
Returns the mutated logits array (the same reference).
Examples
Example 1 — the asymmetric rule in both directions
Input: logits = [2.0, -1.5, -3.0, 5.0], prefix_ids = [1, 2], penalty = 2.0
Output: [2.0, -3.0, -6.0, 5.0]
Explanation: index 1 () is multiplied → ; index 2 () → . Indices 0 and 3 aren't in the prefix, so they're untouched. Both penalised logits move further below zero — less likely after softmax.
Example 2 — penalty applied once per unique id
Input: logits = [4.0, 0.0]; prefix_ids = [0] vs [0, 0, 0, 0]; penalty = 2.0
Output: both -> [2.0, 0.0]
Explanation: token 0 (a positive logit) is divided once: . Repeating it in the prefix doesn't compound the penalty — set(prefix_ids) deduplicates, so it's not .
Constraints
- Sign-aware: positive logits are divided, negative multiplied, zeros left alone — both branches push the logit downward.
- Apply once per unique token id (deduplicate
prefix_ids). - Mutate in place (
/=,*=) and return the same array reference;penalty = 1.0leaveslogitsunchanged. - After the penalty, a seen token is strictly less likely under softmax than without it.
Notes
- Why asymmetric. Naively dividing a negative logit would make it less negative — i.e. more likely, the opposite of the goal. Branching on the sign guarantees the penalty always reduces a seen token's probability.
- Tuning & context. Typical
penaltyis 1.1–1.3; higher is too aggressive, lower barely moves the distribution. It's applied to the logits before a sampling step like top-k / top-p.
This problem ships 6 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.
- •Returns the same array reference (in-place mutation)
- •penalty=1.0 is a no-op
- •Diagnostic: positive logit at a seen token gets DIVIDED by penalty
- •Diagnostic: negative logit at a seen token gets MULTIPLIED by penalty
- •Repeated token ids in prefix do not double-apply the penalty
- •Penalty makes seen tokens strictly less likely after softmax (vs no penalty)