BLEU score (unigram)Medium

BLEU score (unigram)

Background

BLEU (Bilingual Evaluation Understudy) is the classic machine-translation metric: it measures how much a candidate translation overlaps a reference, via clipped n-gram precision and a brevity penalty that discourages too-short outputs. This problem implements the unigram (1-gram) version — the core building block of full BLEU.

Problem statement

Implement bleu_unigram(candidate, reference) returning the unigram BLEU score: the clipped unigram precision (each candidate word credited at most as often as it appears in the reference) times the brevity penalty.

p1=wmin(countc(w),countr(w))candidatep_1 = \frac{\sum_{w}\min\big(\text{count}_c(w),\, \text{count}_r(w)\big)}{|\text{candidate}|} BP={1c>re1r/ccr,BLEU=BPp1\text{BP} = \begin{cases} 1 & c > r \\ e^{\,1 - r/c} & c \le r \end{cases}, \qquad \text{BLEU} = \text{BP}\cdot p_1

where cc and rr are the candidate and reference lengths.

Input

  • candidatelist[str]: the predicted tokens.
  • referencelist[str]: the reference tokens.

Output

Returns a float in [0,1][0, 1].

Examples

Example 1

Input:  candidate = ["the","cat","sat","on","the","mat"]
        reference = ["the","cat","is","on","the","mat"]
Output: 0.8333

Explanation: clipped matches are the(2) + cat(1) + on(1) + mat(1) = 5 of 6 candidate words, so p1=5/6p_1 = 5/6. The lengths are equal so BP=e11=1\text{BP} = e^{1-1} = 1, giving BLEU =0.8333= 0.8333.

Constraints

  • Clip each word's count by its reference count, so repeats cannot be over-credited.
  • Brevity penalty: 11 when the candidate is longer than the reference, otherwise e1r/ce^{1 - r/c}.
  • Return BP×p1\text{BP}\times p_1; treat an empty candidate as 00.

Notes

  • Clipping is what stops a candidate of "the the the the" from scoring perfect precision against a reference that contains "the" once.
  • Full BLEU multiplies the geometric mean of p1,,p4p_1, \dots, p_4 by the brevity penalty; this unigram version isolates the precision + brevity-penalty mechanics.
Python
Loading...

This problem ships 4 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.

  • Reference example: 5/6 with BP=1 -> 0.8333
  • Exact match -> 1.0
  • Clipping prevents over-counting repeated words
  • Brevity penalty punishes too-short candidates