ROUGE-1 scoreEasy

ROUGE-1 score

Background

ROUGE is the standard recall-oriented metric for summarisation: it measures how much of a reference summary's content appears in a generated one. ROUGE-1 counts overlapping unigrams and reports precision, recall, and their F1 — the complement to BLEU, which is precision-oriented for translation.

Problem statement

Implement rouge_1_score(reference, candidate) returning a dict with precision, recall, and f1 for unigram overlap. Lowercase and whitespace-tokenise both texts, count the clipped unigram overlap, then:

precision=overlapcandidate,recall=overlapreference,F1=2PRP+R\text{precision} = \frac{\text{overlap}}{|\text{candidate}|}, \qquad \text{recall} = \frac{\text{overlap}}{|\text{reference}|}, \qquad \text{F1} = \frac{2\,PR}{P + R}

where overlap=wmin(countref(w),countcand(w))\text{overlap} = \sum_w \min\big(\text{count}_{\text{ref}}(w),\, \text{count}_{\text{cand}}(w)\big).

Input

  • referencestr: the reference text.
  • candidatestr: the generated text.

Output

Returns a dict with float values under keys "precision", "recall", "f1" (each in [0,1][0, 1]).

Examples

Example 1

Input:  reference = "the cat sat on the mat", candidate = "the cat is on the mat"
Output: {"precision": 0.8333, "recall": 0.8333, "f1": 0.8333}

Explanation: clipped overlap is the(2) + cat(1) + on(1) + mat(1) = 5; both texts have 6 tokens, so P=R=5/6P = R = 5/6 and F1 =5/6= 5/6.

Constraints

  • Lowercase and split on whitespace.
  • Overlap is clipped by per-word counts in both texts.
  • Guard divide-by-zero: any metric whose denominator is 0 returns 0.00.0.

Notes

  • ROUGE-1 precision is essentially BLEU's unigram precision; ROUGE adds the recall side (did we cover the reference?), which is what matters for summarisation.
  • ROUGE-L (longest common subsequence) and ROUGE-2 (bigrams) extend this; ROUGE-1 is the simplest, most common baseline.
Python
Loading...

This problem ships 4 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.

  • Reference example: all 5/6
  • Identical texts -> all 1.0
  • No overlap -> all zeros
  • Precision and recall differ with different lengths