k-NN classification (majority vote)
Background
k-nearest-neighbours is the simplest non-parametric classifier and the warm-up problem in every classical-ML interview. There is no training — to classify a query point you just find the k closest training points (by Euclidean distance) and take a majority vote of their labels. It is the conceptual baseline that more sophisticated classifiers are compared against.
Problem statement
Implement knn_classify(x_query, X_train, y_train, k):
- Euclidean distance from
x_queryto every row ofX_train. - The
kindices with the smallest distances. - Majority vote among their labels; on a tie, return the smallest class label (deterministic).
Input
x_query— 1-Dnp.ndarrayof shape(D,): the point to classify.X_train— 2-Dnp.ndarrayof shape(N, D): the training points.y_train— 1-Dnp.ndarrayof shape(N,): integer class labels ().k—int: the number of neighbours to vote.
Output
Returns an int: the predicted class label.
Examples
Example 1 — k=1 returns the nearest neighbour's label
Input: x_query = [0.1, 0.1], X_train = [[0,0],[10,10]], y_train = [0,1], k = 1
Output: 0
Explanation: the query is closest to (0,0), whose label is 0; with k=1 the prediction is simply that nearest point's label.
Example 2 — a tie resolves to the smaller label
Input: x_query = [0.5, 0], X_train = [[0,0],[1,0]], y_train = [0,1], k = 2
Output: 0
Explanation: with k=2 both neighbours are included — one class 0, one class 1 — a tie. The convention returns the smallest label, 0.
Constraints
- Compute distances vectorised (e.g.
np.sqrt(((X_train - x_query)**2).sum(axis=1))) — avoid a Python loop over theNrows. - Tie-breaking is deterministic: when classes tie for the most votes, return the smallest label.
np.bincount+np.argmaxgives this for free (argmax returns the lowest index among ties). k = Nfalls back to the global majority class; labels are non-negative integers.- Return a plain
int.
Notes
- Determinism for free. Building the vote histogram with
np.bincountand takingnp.argmaxresolves ties to the smallest label automatically — aCounter/hashmap approach would need explicit tie-breaking logic. - Scaling. The naive distance scan is fine for small
N; k-d trees and ball trees beat it on large datasets. Related: this is the classifier counterpart of k-means clustering.
This problem ships 6 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.
- •Returns an int label
- •k=1 returns the label of the single nearest training point
- •Diagnostic: majority vote among 3 nearest works correctly
- •Tie-breaking: when classes tie, return the smallest label
- •k = N falls back to global majority class
- •Multi-class (3+ labels) works