Time-series anomaly detectionMedium

Time-series anomaly detection

Background

Detecting anomalies (spikes, dropouts, sensor faults) in a time series is a core monitoring task. A naive z-score using the mean and standard deviation is itself wrecked by the very outliers you want to find — a single huge spike inflates the std and hides itself. The robust fix is the modified z-score (Iglewicz & Hoaglin), built on the median and the median absolute deviation (MAD), both of which barely move when a few points go haywire.

Problem statement

Implement detect_anomalies(x, threshold=3.5) returning the indices of points whose modified z-score exceeds the threshold in magnitude:

z~i=0.6745(xix~)MAD,MAD=median(xix~)\tilde z_i = \frac{0.6745\,(x_i - \tilde x)}{\text{MAD}}, \qquad \text{MAD} = \operatorname{median}\big(|x_i - \tilde x|\big)

where x~\tilde x is the median. Flag index ii when z~i>threshold|\tilde z_i| > \text{threshold}.

If MAD = 0, fall back to the mean absolute deviation: z~i=(xix~)/(1.2533meanAD)\tilde z_i = (x_i-\tilde x)/(1.2533\cdot\text{meanAD}). If that is also 0 (all values identical), report no anomalies.

Input

  • x — 1-D sequence (np.ndarray or list) of values.
  • thresholdfloat, the modified-z cutoff (default 3.5, the conventional value).

Output

An np.ndarray of integer indices (ascending) flagged as anomalies; empty if none.

Examples

Example 1

Input:  x = [1, 2, 1, 2, 1, 100, 2, 1], threshold = 3.5
Output: [5]

Explanation: the median is 1.5 and MAD is 0.5. The spike at index 5 has modified z-score 0.6745(1001.5)/0.5132.93.50.6745\cdot(100-1.5)/0.5 \approx 132.9 \gg 3.5, while every other point scores under 1.

Constraints

  • Use the median and MAD, not the mean and std — the whole point is robustness to the outliers.
  • The constant 0.67450.6745 scales MAD to approximate a standard deviation for normal data.
  • Handle MAD == 0 with the mean-absolute-deviation fallback; if all values are equal, return an empty array.

Notes

  • A threshold of 3.5 flags points beyond ~3.5 robust standard deviations — a common default from the original paper.
  • The mean/std z-score suffers from masking: with nn points, one outlier can never exceed z=(n1)/nz=(n-1)/\sqrt{n}, so for small nn it may be impossible to flag at threshold 3.
Python
Loading...

This problem ships 5 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.

  • Reference example flags the spike at index 5
  • Clean data yields no anomalies
  • Detects multiple outliers (nonzero MAD)
  • All-identical series has no anomalies
  • MAD == 0 falls back to mean absolute deviation