Generate random subsets (bootstrap)
Background
Bootstrapping builds many resampled datasets by drawing rows from the original — usually with replacement so each bootstrap sample is the same size as the data but contains repeats. It underpins bagging (random forests train each tree on a bootstrap sample) and nonparametric confidence intervals. Sampling without replacement instead yields random subsets (used for subbagging / pasting).
Problem statement
Implement get_random_subsets(X, y, n_subsets, replacements=True, seed=42) returning a list of n_subsets subsets (X_sub, y_sub):
- Seed the RNG with
np.random.seed(seed). - Subset size is
nifreplacementselsen // 2. - For each subset, draw row indices with
np.random.choice(n, subset_size, replace=replacements)and gather the matching rows ofXandy. - Return each subset as
(X_sub.tolist(), y_sub.tolist()).
Input
X—np.ndarray(n, m)features.y—np.ndarray(n,)targets.n_subsets—int, number of subsets.replacements—bool; with replacement → full-size bootstrap, without → half-size subset.seed—int.
Output
A list of n_subsets tuples (X_sub, y_sub), each a pair of Python lists, with rows of X and y kept in correspondence.
Examples
Example 1
Input: X = 5 rows, y = [1..5], n_subsets = 3, replacements = False
Output: 3 subsets, each with 2 (= 5//2) samples drawn without replacement;
every X row stays paired with its original y.
Explanation: with replacements=False each subset holds n//2 distinct samples; the same index selects from both X and y, preserving the feature–label pairing.
Constraints
- Use the same sampled indices for
Xandyso pairs stay aligned. - Subset size is
n(with replacement) orn // 2(without). - Seed the RNG so results are reproducible for a given
seed.
Notes
- With replacement, a bootstrap sample omits ~37% of the original rows on average (the "out-of-bag" set), which random forests reuse for free validation.
- Each subset must keep
Xandyindex-aligned; shuffling them independently would destroy the labels.
This problem ships 5 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.
- •Returns the requested number of subsets
- •Without replacement, subsets have n//2 distinct samples
- •With replacement, subsets are full size n
- •X and y stay paired by the same indices
- •Reproducible with the same seed