k-fold cross-validationEasy

k-fold cross-validation

Background

k-fold cross-validation estimates how well a model generalizes without wasting data on a single fixed test split. The dataset is partitioned into kk equal folds; each fold serves once as the validation set while the remaining k1k-1 folds are used for training. Averaging the kk scores gives a more stable estimate than one train/test split.

Problem statement

Implement k_fold_cross_validation(X, y, k=5, shuffle=True, random_seed=42) that returns a list of k tuples (train_indices, test_indices), where each element is a Python list of integer indices into the data.

  • If shuffle is True, permute the indices with np.random.default_rng(random_seed) before splitting.
  • Folds must partition the data: when n is not divisible by k, the first n % k folds get one extra element.

Input

  • X, y — arrays of the same length n (only their length is used here).
  • kint, number of folds.
  • shufflebool, whether to shuffle indices first.
  • random_seedint, seed used when shuffling.

Output

A list of k tuples (train_indices, test_indices); each is a list of int. Across all folds, the test sets are disjoint and together cover every index exactly once.

Examples

Example 1

Input:  X = y = [0,1,2,3,4,5,6,7,8,9], k = 5, shuffle = False
Output: [([2,3,4,5,6,7,8,9], [0,1]),
         ([0,1,4,5,6,7,8,9], [2,3]),
         ([0,1,2,3,6,7,8,9], [4,5]),
         ([0,1,2,3,4,5,8,9], [6,7]),
         ([0,1,2,3,4,5,6,7], [8,9])]

Explanation: with 10 points and k=5, each fold holds out 2 consecutive indices as the test set; the other 8 form the training set.

Constraints

  • Each test fold appears exactly once; the union of all test folds is every index.
  • Distribute the remainder: the first n % k folds are one larger than the rest.
  • Return lists of plain Python ints (use .tolist()).

Notes

  • Shuffling before splitting matters when the data is ordered (e.g. sorted by label) — otherwise a fold could be all one class.
  • Setting k = n gives leave-one-out cross-validation.
Python
Loading...

This problem ships 5 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.

  • Reference example (no shuffle)
  • Returns exactly k folds
  • Test folds partition all indices
  • Train and test are disjoint within each fold
  • Uneven split puts extras in the first folds