Generate random subsets (bootstrap)Easy

Generate random subsets (bootstrap)

Background

Bootstrapping builds many resampled datasets by drawing rows from the original — usually with replacement so each bootstrap sample is the same size as the data but contains repeats. It underpins bagging (random forests train each tree on a bootstrap sample) and nonparametric confidence intervals. Sampling without replacement instead yields random subsets (used for subbagging / pasting).

Problem statement

Implement get_random_subsets(X, y, n_subsets, replacements=True, seed=42) returning a list of n_subsets subsets (X_sub, y_sub):

  • Seed the RNG with np.random.seed(seed).
  • Subset size is n if replacements else n // 2.
  • For each subset, draw row indices with np.random.choice(n, subset_size, replace=replacements) and gather the matching rows of X and y.
  • Return each subset as (X_sub.tolist(), y_sub.tolist()).

Input

  • Xnp.ndarray (n, m) features.
  • ynp.ndarray (n,) targets.
  • n_subsetsint, number of subsets.
  • replacementsbool; with replacement → full-size bootstrap, without → half-size subset.
  • seedint.

Output

A list of n_subsets tuples (X_sub, y_sub), each a pair of Python lists, with rows of X and y kept in correspondence.

Examples

Example 1

Input:  X = 5 rows, y = [1..5], n_subsets = 3, replacements = False
Output: 3 subsets, each with 2 (= 5//2) samples drawn without replacement;
        every X row stays paired with its original y.

Explanation: with replacements=False each subset holds n//2 distinct samples; the same index selects from both X and y, preserving the feature–label pairing.

Constraints

  • Use the same sampled indices for X and y so pairs stay aligned.
  • Subset size is n (with replacement) or n // 2 (without).
  • Seed the RNG so results are reproducible for a given seed.

Notes

  • With replacement, a bootstrap sample omits ~37% of the original rows on average (the "out-of-bag" set), which random forests reuse for free validation.
  • Each subset must keep X and y index-aligned; shuffling them independently would destroy the labels.
Python
Loading...

This problem ships 5 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.

  • Returns the requested number of subsets
  • Without replacement, subsets have n//2 distinct samples
  • With replacement, subsets are full size n
  • X and y stay paired by the same indices
  • Reproducible with the same seed