Dropout layer (forward & backward)Mediumnumpyregularizationdropoutneural-networkbackpropagation
Dropout layer (forward & backward)
Background
Dropout is a regularizer that randomly zeroes a fraction of activations during training, forcing the network not to rely on any single unit. The modern form is inverted dropout: surviving activations are scaled up by during training so the layer's expected output is unchanged — which lets you leave the layer as a no-op at inference time.
Problem statement
Implement a DropoutLayer class:
__init__(self, p)— store the drop probability ; raiseValueErrorotherwise.forward(self, x, training=True):- if
trainingisFalse, returnxunchanged; - otherwise sample a binary mask of
x's shape, store it, and return
- if
backward(self, grad)— route gradients through the same mask and scale: .
Input
p—floatdrop probability in .x—np.ndarrayof activations.grad—np.ndarrayof upstream gradients (same shape asx).training—bool; whenFalse,forwardis the identity.
Output
forwardreturns annp.ndarray(scaled, masked activations, orxunchanged at inference).backwardreturns the gradient w.r.t. the input, using the stored mask and scale.
Examples
Example 1
x = [1, 2, 3, 4], p = 0.5, sampled mask = [1, 0, 1, 0]
forward(x) -> [2, 0, 6, 0] # surviving values scaled by 1/(1-0.5) = 2
backward([0.1,0.2,0.3,0.4]) -> [0.2, 0, 0.6, 0]
Explanation: positions 1 and 3 are dropped (mask 0); the survivors are scaled by 2. The backward pass reuses the identical mask and scale.
Constraints
- Use inverted scaling on both forward and backward.
backwardmust use the mask saved duringforward(raise ifforwardwas never called).- At inference (
training=False),forwardreturns the input untouched.
Notes
- Inverted dropout keeps inference cheap: because the expectation is corrected at train time, you do nothing special when evaluating.
- The expected forward output equals
x: .
Python
Loading...
This problem ships 6 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.
- •Inference mode passes input through unchanged
- •Forward: survivors are scaled by 1/(1-p), dropped are 0
- •Backward reuses the same mask and scale
- •p = 0 is a no-op (mask all ones)
- •Invalid p raises ValueError
- •Expected forward output is unbiased over many samples