k-NN classification (majority vote)Easy

k-NN classification (majority vote)

Background

k-nearest-neighbours is the simplest non-parametric classifier and the warm-up problem in every classical-ML interview. There is no training — to classify a query point you just find the k closest training points (by Euclidean distance) and take a majority vote of their labels. It is the conceptual baseline that more sophisticated classifiers are compared against.

Problem statement

Implement knn_classify(x_query, X_train, y_train, k):

di=xqueryXi2,y^=mode({yi:itop-k nearest})d_i = \lVert x_{\text{query}} - X_i \rVert_2, \qquad \hat{y} = \text{mode}\big(\{\,y_i : i \in \text{top-}k\text{ nearest}\,\}\big)
  1. Euclidean distance from x_query to every row of X_train.
  2. The k indices with the smallest distances.
  3. Majority vote among their labels; on a tie, return the smallest class label (deterministic).

Input

  • x_query — 1-D np.ndarray of shape (D,): the point to classify.
  • X_train — 2-D np.ndarray of shape (N, D): the training points.
  • y_train — 1-D np.ndarray of shape (N,): integer class labels (0\ge 0).
  • kint 1\ge 1: the number of neighbours to vote.

Output

Returns an int: the predicted class label.

Examples

Example 1 — k=1 returns the nearest neighbour's label

Input:  x_query = [0.1, 0.1], X_train = [[0,0],[10,10]], y_train = [0,1], k = 1
Output: 0

Explanation: the query is closest to (0,0), whose label is 0; with k=1 the prediction is simply that nearest point's label.

Example 2 — a tie resolves to the smaller label

Input:  x_query = [0.5, 0], X_train = [[0,0],[1,0]], y_train = [0,1], k = 2
Output: 0

Explanation: with k=2 both neighbours are included — one class 0, one class 1 — a tie. The convention returns the smallest label, 0.

Constraints

  • Compute distances vectorised (e.g. np.sqrt(((X_train - x_query)**2).sum(axis=1))) — avoid a Python loop over the N rows.
  • Tie-breaking is deterministic: when classes tie for the most votes, return the smallest label. np.bincount + np.argmax gives this for free (argmax returns the lowest index among ties).
  • k = N falls back to the global majority class; labels are non-negative integers.
  • Return a plain int.

Notes

  • Determinism for free. Building the vote histogram with np.bincount and taking np.argmax resolves ties to the smallest label automatically — a Counter/hashmap approach would need explicit tie-breaking logic.
  • Scaling. The naive O(ND)O(N\cdot D) distance scan is fine for small N; k-d trees and ball trees beat it on large datasets. Related: this is the classifier counterpart of k-means clustering.
Python
Loading...

This problem ships 6 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.

  • Returns an int label
  • k=1 returns the label of the single nearest training point
  • Diagnostic: majority vote among 3 nearest works correctly
  • Tie-breaking: when classes tie, return the smallest label
  • k = N falls back to global majority class
  • Multi-class (3+ labels) works