BLEU score (unigram)
Background
BLEU (Bilingual Evaluation Understudy) is the classic machine-translation metric: it measures how much a candidate translation overlaps a reference, via clipped n-gram precision and a brevity penalty that discourages too-short outputs. This problem implements the unigram (1-gram) version — the core building block of full BLEU.
Problem statement
Implement bleu_unigram(candidate, reference) returning the unigram BLEU score: the clipped unigram precision (each candidate word credited at most as often as it appears in the reference) times the brevity penalty.
where and are the candidate and reference lengths.
Input
candidate—list[str]: the predicted tokens.reference—list[str]: the reference tokens.
Output
Returns a float in .
Examples
Example 1
Input: candidate = ["the","cat","sat","on","the","mat"]
reference = ["the","cat","is","on","the","mat"]
Output: 0.8333
Explanation: clipped matches are the(2) + cat(1) + on(1) + mat(1) = 5 of 6 candidate words, so . The lengths are equal so , giving BLEU .
Constraints
- Clip each word's count by its reference count, so repeats cannot be over-credited.
- Brevity penalty: when the candidate is longer than the reference, otherwise .
- Return ; treat an empty candidate as .
Notes
- Clipping is what stops a candidate of "the the the the" from scoring perfect precision against a reference that contains "the" once.
- Full BLEU multiplies the geometric mean of by the brevity penalty; this unigram version isolates the precision + brevity-penalty mechanics.
This problem ships 4 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.
- •Reference example: 5/6 with BP=1 -> 0.8333
- •Exact match -> 1.0
- •Clipping prevents over-counting repeated words
- •Brevity penalty punishes too-short candidates