k-fold cross-validation
Background
k-fold cross-validation estimates how well a model generalizes without wasting data on a single fixed test split. The dataset is partitioned into equal folds; each fold serves once as the validation set while the remaining folds are used for training. Averaging the scores gives a more stable estimate than one train/test split.
Problem statement
Implement k_fold_cross_validation(X, y, k=5, shuffle=True, random_seed=42) that returns a list of k tuples (train_indices, test_indices), where each element is a Python list of integer indices into the data.
- If
shuffleisTrue, permute the indices withnp.random.default_rng(random_seed)before splitting. - Folds must partition the data: when
nis not divisible byk, the firstn % kfolds get one extra element.
Input
X,y— arrays of the same lengthn(only their length is used here).k—int, number of folds.shuffle—bool, whether to shuffle indices first.random_seed—int, seed used when shuffling.
Output
A list of k tuples (train_indices, test_indices); each is a list of int. Across all folds, the test sets are disjoint and together cover every index exactly once.
Examples
Example 1
Input: X = y = [0,1,2,3,4,5,6,7,8,9], k = 5, shuffle = False
Output: [([2,3,4,5,6,7,8,9], [0,1]),
([0,1,4,5,6,7,8,9], [2,3]),
([0,1,2,3,6,7,8,9], [4,5]),
([0,1,2,3,4,5,8,9], [6,7]),
([0,1,2,3,4,5,6,7], [8,9])]
Explanation: with 10 points and k=5, each fold holds out 2 consecutive indices as the test set; the other 8 form the training set.
Constraints
- Each test fold appears exactly once; the union of all test folds is every index.
- Distribute the remainder: the first
n % kfolds are one larger than the rest. - Return lists of plain Python
ints (use.tolist()).
Notes
- Shuffling before splitting matters when the data is ordered (e.g. sorted by label) — otherwise a fold could be all one class.
- Setting
k = ngives leave-one-out cross-validation.
This problem ships 5 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.
- •Reference example (no shuffle)
- •Returns exactly k folds
- •Test folds partition all indices
- •Train and test are disjoint within each fold
- •Uneven split puts extras in the first folds