F1 score (binary classification)
Background
The F1 score is the standard one-number summary of a binary classifier: the harmonic mean of precision (of the things you flagged positive, how many were right?) and recall (of the truly positive things, how many did you catch?). The harmonic mean is the point — it punishes imbalance, so a model that is precise but barely recalls anything is not rewarded the way a plain average would reward it.
Problem statement
Implement f1_score(y_pred, y_true) for binary 0/1 arrays. With true positives , false positives , and false negatives :
Return 0.0 (not NaN) whenever a denominator is zero.
Input
y_pred— 1-Dnp.ndarrayof 0s and 1s: the predictions.y_true— 1-Dnp.ndarrayof 0s and 1s: the ground-truth labels.
Output
Returns a float in .
Examples
Example 1 — a hand-checked case
Input: y_true = [1, 1, 1, 0, 0, 0], y_pred = [1, 1, 0, 1, 0, 0]
Output: 0.6667 (= 2/3)
Explanation: (predicted 1 and truly 1), (predicted 1 but truly 0), (truly 1 but predicted 0). So , , and .
Example 2 — no positive predictions returns 0, not NaN
Input: y_true = [1, 1, 0, 0], y_pred = [0, 0, 0, 0]
Output: 0.0
Explanation: the model never predicts positive, so and precision is . Return 0.0 explicitly instead of letting the division produce NaN.
Constraints
- Count from boolean masks (e.g.
(y_pred==1) & (y_true==1)for ). - Return
0.0whenTP + FP == 0(no positive predictions), whenTP + FN == 0(no positive labels), or whenP + R == 0— neverNaN/inf. - is symmetric in precision and recall: a model and a model both score .
- Binary only; the output lies in , equal to 1.0 only on perfect predictions.
Notes
- Why harmonic, not arithmetic. A model scores , not — the harmonic mean collapses toward the weaker of the two, so one strong half cannot mask a failing half.
- Multi-class. Compute F1 per class and average (macro / weighted / micro); this problem is the binary base case.
This problem ships 6 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.
- •Perfect predictions: F1 = 1.0
- •All wrong: F1 = 0.0
- •Diagnostic: matches the harmonic-mean formula on a hand-checked case
- •No positive predictions (TP + FP = 0): returns 0.0, not NaN
- •No positive labels (TP + FN = 0): returns 0.0, not NaN
- •F1 is symmetric in precision and recall (high precision + low recall == low precision + high recall)