METEOR score
Background
METEOR scores machine translation by aligning unigrams between candidate and reference, then combining a recall-weighted F-mean with a fragmentation penalty that punishes scrambled word order. Unlike BLEU it rewards recall and ordering explicitly, correlating better with human judgement at the sentence level.
Problem statement
Implement meteor_score(reference, candidate, alpha=0.9, beta=3, gamma=0.5). Lowercase and tokenise both, count matched unigrams (clipped), then:
where chunks is the number of contiguous runs of matched words in the candidate. Round to 3 decimals.
Input
reference—str.candidate—str.alpha,beta,gamma—float: METEOR parameters (defaults 0.9, 3, 0.5).
Output
Returns a float in (rounded to 3 decimals).
Examples
Example 1
Input: meteor_score("Rain falls gently from the sky", "Gentle rain drops from the sky")
Output: 0.625
Explanation: 4 unigrams match (rain, from, the, sky); gives . The matches form 2 chunks in the candidate, so and METEOR .
Constraints
- Lowercase + whitespace tokenise; match unigrams with clipping (each reference word usable once).
- ; chunks = contiguous runs of matched candidate positions.
- Penalty ; final score , rounded to 3 dp.
- Return 0 if there are no matches or either text is empty.
Notes
- The chunk penalty is METEOR's signature: the same matched words score lower when scattered (many chunks) than when contiguous (few chunks).
- The default weights recall heavily — in translation, covering the reference matters more than terseness.
This problem ships 4 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.
- •Reference example: 0.625
- •Longer exact match scores near 1
- •No overlapping words -> 0
- •Fragmented matches score lower than contiguous (same number of matches)