Dropout layer (forward & backward)Medium

Dropout layer (forward & backward)

Background

Dropout is a regularizer that randomly zeroes a fraction pp of activations during training, forcing the network not to rely on any single unit. The modern form is inverted dropout: surviving activations are scaled up by 1/(1p)1/(1-p) during training so the layer's expected output is unchanged — which lets you leave the layer as a no-op at inference time.

Problem statement

Implement a DropoutLayer class:

  • __init__(self, p) — store the drop probability p[0,1)p\in[0,1); raise ValueError otherwise.
  • forward(self, x, training=True):
    • if training is False, return x unchanged;
    • otherwise sample a binary mask mBernoulli(1p)m\sim\text{Bernoulli}(1-p) of x's shape, store it, and return
y=xm1py = \frac{x \odot m}{1-p}
  • backward(self, grad) — route gradients through the same mask and scale: gradm1p\dfrac{grad \odot m}{1-p}.

Input

  • pfloat drop probability in [0,1)[0, 1).
  • xnp.ndarray of activations.
  • gradnp.ndarray of upstream gradients (same shape as x).
  • trainingbool; when False, forward is the identity.

Output

  • forward returns an np.ndarray (scaled, masked activations, or x unchanged at inference).
  • backward returns the gradient w.r.t. the input, using the stored mask and scale.

Examples

Example 1

x = [1, 2, 3, 4], p = 0.5, sampled mask = [1, 0, 1, 0]
forward(x)  -> [2, 0, 6, 0]      # surviving values scaled by 1/(1-0.5) = 2
backward([0.1,0.2,0.3,0.4]) -> [0.2, 0, 0.6, 0]

Explanation: positions 1 and 3 are dropped (mask 0); the survivors are scaled by 2. The backward pass reuses the identical mask and scale.

Constraints

  • Use inverted scaling 1/(1p)1/(1-p) on both forward and backward.
  • backward must use the mask saved during forward (raise if forward was never called).
  • At inference (training=False), forward returns the input untouched.

Notes

  • Inverted dropout keeps inference cheap: because the expectation is corrected at train time, you do nothing special when evaluating.
  • The expected forward output equals x: E[m]11p=(1p)11p=1\mathbb{E}[m]\cdot\frac{1}{1-p} = (1-p)\cdot\frac{1}{1-p} = 1.
Python
Loading...

This problem ships 6 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.

  • Inference mode passes input through unchanged
  • Forward: survivors are scaled by 1/(1-p), dropped are 0
  • Backward reuses the same mask and scale
  • p = 0 is a no-op (mask all ones)
  • Invalid p raises ValueError
  • Expected forward output is unbiased over many samples