Reservoir sampling
Background
Reservoir sampling picks k items uniformly at random from a stream whose length you do not know in advance and cannot fully store. Algorithm R keeps a k-slot "reservoir": the first k items fill it, then each later item (index i) replaces a random reservoir slot with probability k/(i+1). When the stream ends, every item has had an equal k/n chance of being in the reservoir.
Problem statement
Implement reservoir_sample(stream, k, seed=0) returning k sampled items:
- Fill the reservoir with the first
kitems. - For each later index
i(≥k), drawjuniform in{0,...,i}; ifj < k, setreservoir[j] = stream[i].
Use np.random.default_rng(seed). If the stream has ≤ k items, return all of them.
Input
stream— an iterable/sequence of items.k—int, reservoir size.seed—int, RNG seed.
Output
An np.ndarray of up to k sampled items (all elements if the stream is shorter than k).
Examples
Example 1
Input: stream = [10, 20, 30, 40, 50], k = 2
Output: 2 items from the stream; over many seeds each item appears with probability 2/5
Explanation: the first 2 items seed the reservoir; items 3–5 each get a chance to replace a slot, balanced so all 5 items are equally likely to survive.
Constraints
- The replacement index
jis drawn from{0,...,i}inclusive; replace only whenj < k. - One pass over the stream, storing at most
kitems. - If
len(stream) <= k, return everything.
Notes
- The magic is the shrinking acceptance probability
k/(i+1): it exactly cancels the chance of later eviction so the final distribution is uniform. - It is the standard way to sample from logs, Kafka streams, or any data too large to hold in memory.
This problem ships 5 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.
- •Returns k items, all drawn from the stream
- •Stream shorter than k returns everything
- •Samples are distinct (sampling without replacement)
- •Uniformity: each element survives with probability k/n
- •Reproducible with the same seed