Cohen's kappa scoreMedium

Cohen's kappa score

Background

Cohen's kappa (κ\kappa) measures agreement between two raters — or a prediction against ground truth — corrected for the agreement expected by chance. Two raters labelling at random still agree sometimes, so raw accuracy overstates real agreement; κ\kappa subtracts off that chance baseline. κ=1\kappa = 1 is perfect agreement, 00 is chance-level, and negative means worse than chance. It is the honest metric for imbalanced or multi-rater labelling.

Problem statement

Implement cohens_kappa(y1, y2) for two label sequences:

κ=pope1pe\kappa = \frac{p_o - p_e}{1 - p_e}

where pop_o is the observed agreement (fraction of matching labels) and pep_e is the chance agreement pe=cp1(c)p2(c)p_e = \sum_c p_1(c)\,p_2(c), with pr(c)p_r(c) the fraction of rater rr's labels equal to class cc.

Input

  • y1 — array-like of labels from rater 1.
  • y2 — array-like of labels from rater 2, the same length.

Output

Returns a float (typically in [1,1][-1, 1]; 11 = perfect agreement).

Examples

Example 1

Input:  y1 = [1, 0, 1, 1, 0], y2 = [1, 0, 0, 1, 0]
Output: 0.6154

Explanation: observed agreement po=4/5=0.8p_o = 4/5 = 0.8. Chance agreement pe=(0.6)(0.4)+(0.4)(0.6)=0.48p_e = (0.6)(0.4) + (0.4)(0.6) = 0.48, so κ=(0.80.48)/(10.48)=0.6154\kappa = (0.8 - 0.48)/(1 - 0.48) = 0.6154.

Constraints

  • pop_o is the fraction of positions where the two labels match.
  • pe=cp1(c)p2(c)p_e = \sum_c p_1(c)\,p_2(c) over all classes that appear in either sequence.
  • If pe=1p_e = 1 (both raters constant on the same class), return 1.01.0 to guard the 0/00/0.

Notes

  • κ\kappa can be negative when raters agree less often than chance would predict.
  • The chance correction is what makes κ\kappa informative on imbalanced data, where plain accuracy looks high simply because one class dominates.
Python
Loading...

This problem ships 4 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.

  • Reference example: kappa = 0.6154
  • Perfect agreement -> 1.0
  • Symmetric in the two raters
  • Chance-level agreement gives kappa = 0