F1 score (binary classification)Easy

F1 score (binary classification)

Background

The F1 score is the standard one-number summary of a binary classifier: the harmonic mean of precision (of the things you flagged positive, how many were right?) and recall (of the truly positive things, how many did you catch?). The harmonic mean is the point — it punishes imbalance, so a model that is precise but barely recalls anything is not rewarded the way a plain average would reward it.

Problem statement

Implement f1_score(y_pred, y_true) for binary 0/1 arrays. With true positives TPTP, false positives FPFP, and false negatives FNFN:

P=TPTP+FP,R=TPTP+FN,F1=2PRP+R=2TP2TP+FP+FNP = \frac{TP}{TP+FP}, \qquad R = \frac{TP}{TP+FN}, \qquad F_1 = \frac{2PR}{P+R} = \frac{2\,TP}{2\,TP + FP + FN}

Return 0.0 (not NaN) whenever a denominator is zero.

Input

  • y_pred — 1-D np.ndarray of 0s and 1s: the predictions.
  • y_true — 1-D np.ndarray of 0s and 1s: the ground-truth labels.

Output

Returns a float in [0,1][0, 1].

Examples

Example 1 — a hand-checked case

Input:  y_true = [1, 1, 1, 0, 0, 0], y_pred = [1, 1, 0, 1, 0, 0]
Output: 0.6667   (= 2/3)

Explanation: TP=2TP = 2 (predicted 1 and truly 1), FP=1FP = 1 (predicted 1 but truly 0), FN=1FN = 1 (truly 1 but predicted 0). So P=2/3P = 2/3, R=2/3R = 2/3, and F1=2232323+23=23F_1 = \frac{2\cdot\frac23\cdot\frac23}{\frac23+\frac23} = \frac23.

Example 2 — no positive predictions returns 0, not NaN

Input:  y_true = [1, 1, 0, 0], y_pred = [0, 0, 0, 0]
Output: 0.0

Explanation: the model never predicts positive, so TP+FP=0TP + FP = 0 and precision is 0/00/0. Return 0.0 explicitly instead of letting the division produce NaN.

Constraints

  • Count TP,FP,FNTP, FP, FN from boolean masks (e.g. (y_pred==1) & (y_true==1) for TPTP).
  • Return 0.0 when TP + FP == 0 (no positive predictions), when TP + FN == 0 (no positive labels), or when P + R == 0 — never NaN/inf.
  • F1F_1 is symmetric in precision and recall: a (P=1,R=0.2)(P{=}1, R{=}0.2) model and a (P=0.2,R=1)(P{=}0.2, R{=}1) model both score 1/31/3.
  • Binary only; the output lies in [0,1][0, 1], equal to 1.0 only on perfect predictions.

Notes

  • Why harmonic, not arithmetic. A (P=1.0,R=0.01)(P{=}1.0, R{=}0.01) model scores F10.02F_1\approx0.02, not 0.50.5 — the harmonic mean collapses toward the weaker of the two, so one strong half cannot mask a failing half.
  • Multi-class. Compute F1 per class and average (macro / weighted / micro); this problem is the binary base case.
Python
Loading...

This problem ships 6 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.

  • Perfect predictions: F1 = 1.0
  • All wrong: F1 = 0.0
  • Diagnostic: matches the harmonic-mean formula on a hand-checked case
  • No positive predictions (TP + FP = 0): returns 0.0, not NaN
  • No positive labels (TP + FN = 0): returns 0.0, not NaN
  • F1 is symmetric in precision and recall (high precision + low recall == low precision + high recall)