DBSCAN clustering
Background
DBSCAN (Density-Based Spatial Clustering of Applications with Noise) groups points that are packed closely together and labels points in sparse regions as noise. Unlike k-means it needs no preset cluster count, discovers arbitrarily-shaped clusters, and is robust to outliers. Two knobs control it: eps (neighbourhood radius) and min_samples (how many points make a region "dense").
Problem statement
Implement dbscan(X, eps, min_samples) that returns a cluster label for every point. A point is a core point when its -neighbourhood contains at least min_samples points (including itself):
Clusters grow from core points by absorbing every point density-reachable from them; points reachable from no core point are noise (label ).
Input
X—np.ndarrayof shape(n_samples, n_features): the points.eps—float: neighbourhood radius (inclusive).min_samples—int: minimum points in an -neighbourhood (counting the point itself) for it to be core.
Output
Returns an np.ndarray of shape (n_samples,) of integer labels: cluster ids 0, 1, 2, … in order of discovery, and -1 for noise.
Examples
Example 1
Input: X = [[0,0],[0,0.5],[0.5,0],[10,10],[10,10.5],[10.5,10],[50,50]]
eps = 1.0, min_samples = 2
Output: [0, 0, 0, 1, 1, 1, -1]
Explanation: the three points near the origin form one dense cluster (label 0), the three near (10, 10) form another (label 1), and the lone point at (50, 50) has no neighbour within eps, so it is noise (-1).
Constraints
- Use Euclidean distance; the
epsneighbourhood is inclusive (). min_samplescounts the point itself.- Noise points get label
-1; cluster ids start at0and increase in discovery order. - A border point (non-core but within
epsof a core point) joins that core point's cluster.
Notes
- DBSCAN's strength is finding non-convex clusters and ignoring outliers; its weakness is the single global
eps, which struggles when clusters have very different densities. - A border point can be reachable from more than one cluster; the standard algorithm assigns it to whichever cluster reaches it first — an order-dependent but harmless ambiguity.
This problem ships 4 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.
- •Two dense blobs plus one outlier -> 2 clusters and 1 noise point
- •A single dense cluster gets one label and no noise
- •min_samples larger than the dataset marks everything as noise
- •Tiny eps isolates every point as noise