Sigmoid forward + backward
Background
The sigmoid squashes any real number into , which makes it the activation of choice when you need a probability. It predates the deep-learning era and is still everywhere in binary classification heads and gates. Implementing it well means two things: a forward pass that stays finite at extreme inputs, and a backward pass that exploits the elegant derivative.
Problem statement
Implement two functions:
sigmoid(x) must be numerically stable (no NaN/inf even at ). sigmoid_backward(grad_out, sigmoid_out) takes the saved output (not the input) and returns the input gradient via .
Input
sigmoid(x)—x:np.ndarrayof any shape.sigmoid_backward(grad_out, sigmoid_out)—grad_out: upstream gradient w.r.t. the output;sigmoid_out: the output of the forward pass (same shape).
Output
sigmoidreturns an array of the same shape with values in .sigmoid_backwardreturns the gradient w.r.t. the input, same shape.
Examples
Example 1 — forward on moderate inputs
Input: x = [-2.0, 0.0, 2.0]
Output: ≈ [0.1192, 0.5, 0.8808]
Explanation: exactly, and the function is symmetric: .
Example 2 — backward, including the saturated tails
Input: x = [-100, 0, 100] -> s = sigmoid(x) ≈ [0, 0.5, 1]
grad_out = [1, 1, 1]
Output: ≈ [0.0, 0.25, 0.0]
Explanation: the gradient is . At , gives the maximum ; at the saturated tails or , so — the vanishing gradient.
Constraints
- Numerical stability: the naive
1/(1+exp(-x))overflows for very negative (sinceexp(-x)→inf). Branch on the sign so theexpargument is always — useexp(x)/(1+exp(x))for , the original for . Both equal ; only one stays finite. - Output values lie in ; at the result saturates to / with no
NaN/inf. sigmoid_backwardconsumes the output , not the input: .- The analytic backward must match a finite-difference gradient (
atol≈1e-4); exactly.
Notes
- Why pass the output. Saving makes the backward a single multiply with no extra
expcalls — which is why framework activation APIs hand the backward the cached output. - Saturation = vanishing gradient. at the tails is exactly the failure mode that kills learning in deep sigmoid stacks, and the reason ReLU (see ReLU forward + backward) was adopted instead.
This problem ships 6 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.
- •sigmoid: output shape matches input
- •sigmoid: matches the formula on a moderate-magnitude input
- •sigmoid: numerically stable at extreme inputs (no NaN, no inf)
- •sigmoid_backward: matches s * (1 - s) * grad_out
- •sigmoid_backward: matches finite-difference numerical gradient
- •sigmoid_backward: gradient saturates at extreme inputs (≈ 0)