He weight initialization
Background
Weight initialization sets the starting scale of a network's parameters. If weights are too large or too small, signals (and gradients) explode or vanish as they pass through layers. He initialization (Kaiming, 2015) is designed for ReLU networks: it draws weights from a zero-mean normal whose variance is , which keeps the variance of activations roughly constant across layers.
Problem statement
Implement he_init(fan_in, fan_out, seed=42) that returns a weight matrix of shape (fan_in, fan_out) with entries drawn from:
Use np.random.default_rng(seed) and scale standard normal samples by .
Input
fan_in—int, number of input units (rows).fan_out—int, number of output units (columns).seed—int, RNG seed for reproducibility.
Output
An np.ndarray of shape (fan_in, fan_out), each entry .
Examples
Example 1
Input: fan_in = 4, fan_out = 3, seed = 0
Output: array of shape (4, 3); empirical std approximately sqrt(2/4) = 0.707
Explanation: each weight is a standard normal sample multiplied by , so the entries have mean and standard deviation .
Constraints
- The scaling factor is (the "2" is what distinguishes He from Xavier/Glorot, which uses 1 or ).
- Use
np.random.default_rng(seed)so results are reproducible. - Return a real-valued array of shape
(fan_in, fan_out).
Notes
- The factor 2 compensates for ReLU zeroing out half its inputs (halving the variance) — for tanh you would use Xavier's factor of 1 instead.
- Initializing all weights to the same constant breaks symmetry: every neuron in a layer would learn the same thing, so random init is essential.
This problem ships 4 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.
- •Correct output shape
- •Reproducible with the same seed
- •Empirical std matches sqrt(2/fan_in)
- •Mean is approximately zero