Divide dataset by feature thresholdEasy

Divide dataset by feature threshold

Background

Splitting a dataset on a single feature is the atomic operation inside every decision tree: choose a feature and a threshold, then partition the rows into those that pass and those that fail. Supporting both numeric thresholds (compare with \ge) and categorical thresholds (compare with ==) lets one routine drive both numeric and categorical splits.

Problem statement

Implement divide_on_feature(X, feature_i, threshold) that partitions the rows of X into two subsets using column feature_i:

X1={xX:cond(xfeature_i)},X2=XX1X_1 = \{\, x \in X : \text{cond}(x_{\text{feature\_i}}) \,\}, \qquad X_2 = X \setminus X_1

where the condition is xfeature_ithresholdx_{\text{feature\_i}} \ge \text{threshold} when threshold is numeric, and xfeature_i=thresholdx_{\text{feature\_i}} = \text{threshold} when it is non-numeric.

Input

  • Xnp.ndarray of shape (n_samples, n_features).
  • feature_iint: index of the column to split on.
  • threshold — a numeric or categorical value defining the split.

Output

Returns a list [X_1, X_2] of two numpy arrays: X_1 holds the rows satisfying the condition, X_2 holds the rest. Either subset may be empty.

Examples

Example 1

Input:  X = [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]], feature_i = 0, threshold = 5
Output: [[[5, 6], [7, 8], [9, 10]], [[1, 2], [3, 4]]]

Explanation: column 0 values are [1,3,5,7,9][1, 3, 5, 7, 9]. Rows with value 5\ge 5 (i.e. 5,7,95, 7, 9) form X_1; the rest (1,31, 3) form X_2.

Constraints

  • Numeric threshold → use >=; non-numeric threshold → use ==.
  • Preserve the original row order within each subset.
  • Return exactly two arrays as [X_1, X_2]; either may be empty.

Notes

  • This is the workhorse behind tree node splits: pair it with an impurity criterion (Gini / entropy) and you have a complete decision-tree split.
  • Using >= (rather than >) means a sample exactly equal to the threshold lands in the first subset.
Python
Loading...

This problem ships 4 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.

  • Numeric split: rows with feature >= threshold go first
  • Partition is complete (subset sizes sum to n)
  • Categorical (string) threshold uses equality
  • Threshold above all values yields an empty first subset