Inverse-transform samplingEasy

Inverse-transform sampling

Background

Inverse-transform sampling turns uniform random numbers into samples from any distribution with a known CDF. For a discrete distribution with pmf pp, you build the cumulative distribution FF and, for each uU(0,1)u\sim\mathcal U(0,1), return the first category whose cumulative probability reaches uu. It is the standard way numpy/random draw categorical samples.

Problem statement

Implement inverse_transform_sample(pmf, n, seed=0) drawing n integer category indices:

Fk=jkpj,uiU(0,1),xi=min{k:ui<Fk}F_k = \sum_{j\le k} p_j, \qquad u_i \sim \mathcal U(0,1), \qquad x_i = \min\{k : u_i < F_k\}

Build the cumulative sum of pmf and use np.searchsorted(cdf, u, side='right') to map each uniform to a category.

Input

  • pmfnp.ndarray (K,), a probability mass function (sums to 1, non-negative).
  • nint, number of samples.
  • seedint, RNG seed (np.random.default_rng).

Output

An np.ndarray of n integer indices in [0, K).

Examples

Example 1

Input:  pmf = [0.2, 0.3, 0.5], n = large
Output: indices 0/1/2 occurring ~20% / 30% / 50% of the time

Explanation: the CDF is [0.2,0.5,1.0][0.2, 0.5, 1.0]. A uniform uu lands in category 0 with probability 0.2 (when u<0.2u<0.2), category 1 with 0.3, and category 2 with 0.5 — reproducing the pmf.

Constraints

  • Map each uniform via the cumulative distribution (searchsorted), not by looping per category.
  • Returned indices are integers in [0, len(pmf)).
  • A degenerate pmf (all mass on one category) must always return that category.

Notes

  • The method generalizes to continuous distributions: x=F1(u)x = F^{-1}(u) with the inverse CDF (e.g. ln(1u)/λ-\ln(1-u)/\lambda for an exponential).
  • Using side='right' ensures u=0u=0 maps into the first category that actually carries probability mass.
Python
Loading...

This problem ships 5 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.

  • Empirical frequencies match the pmf
  • Indices are within range
  • Degenerate pmf always returns the same category
  • Returns exactly n samples
  • Reproducible with the same seed