Sigmoid forward + backwardEasy

Sigmoid forward + backward

Background

The sigmoid σ(x)=1/(1+ex)\sigma(x) = 1/(1 + e^{-x}) squashes any real number into (0,1)(0, 1), which makes it the activation of choice when you need a probability. It predates the deep-learning era and is still everywhere in binary classification heads and gates. Implementing it well means two things: a forward pass that stays finite at extreme inputs, and a backward pass that exploits the elegant s(1s)s(1-s) derivative.

Problem statement

Implement two functions:

σ(x)=11+ex,Lx=Lys(1s)where s=σ(x)\sigma(x) = \frac{1}{1 + e^{-x}}, \qquad\qquad \frac{\partial L}{\partial x} = \frac{\partial L}{\partial y}\cdot s\,(1 - s) \quad\text{where } s = \sigma(x)

sigmoid(x) must be numerically stable (no NaN/inf even at x=±1000x = \pm 1000). sigmoid_backward(grad_out, sigmoid_out) takes the saved output ss (not the input) and returns the input gradient via s(1s)s(1-s).

Input

  • sigmoid(x)x: np.ndarray of any shape.
  • sigmoid_backward(grad_out, sigmoid_out)grad_out: upstream gradient w.r.t. the output; sigmoid_out: the output ss of the forward pass (same shape).

Output

  • sigmoid returns an array of the same shape with values in (0,1)(0, 1).
  • sigmoid_backward returns the gradient w.r.t. the input, same shape.

Examples

Example 1 — forward on moderate inputs

Input:  x = [-2.0, 0.0, 2.0]
Output: ≈ [0.1192, 0.5, 0.8808]

Explanation: σ(0)=0.5\sigma(0) = 0.5 exactly, and the function is symmetric: σ(2)=1σ(2)0.1192\sigma(-2) = 1 - \sigma(2) \approx 0.1192.

Example 2 — backward, including the saturated tails

Input:  x = [-100, 0, 100]  ->  s = sigmoid(x) ≈ [0, 0.5, 1]
        grad_out = [1, 1, 1]
Output: ≈ [0.0, 0.25, 0.0]

Explanation: the gradient is s(1s)s(1-s). At x=0x=0, s=0.5s=0.5 gives the maximum 0.50.5=0.250.5\cdot0.5 = 0.25; at the saturated tails s0s\approx 0 or 11, so s(1s)0s(1-s)\approx 0 — the vanishing gradient.

Constraints

  • Numerical stability: the naive 1/(1+exp(-x)) overflows for very negative xx (since exp(-x)→inf). Branch on the sign so the exp argument is always 0\le 0 — use exp(x)/(1+exp(x)) for x<0x < 0, the original for x0x \ge 0. Both equal σ\sigma; only one stays finite.
  • Output values lie in (0,1)(0, 1); at x=±1000x=\pm1000 the result saturates to 0\approx 0 / 1\approx 1 with no NaN/inf.
  • sigmoid_backward consumes the output ss, not the input: Lx=grad_outs(1s)\dfrac{\partial L}{\partial x} = \text{grad\_out}\cdot s(1-s).
  • The analytic backward must match a finite-difference gradient (atol≈1e-4); σ(0)=0.25\sigma'(0) = 0.25 exactly.

Notes

  • Why pass the output. Saving ss makes the backward a single multiply with no extra exp calls — which is why framework activation APIs hand the backward the cached output.
  • Saturation = vanishing gradient. s(1s)0s(1-s)\to 0 at the tails is exactly the failure mode that kills learning in deep sigmoid stacks, and the reason ReLU (see ReLU forward + backward) was adopted instead.
Python
Loading...

This problem ships 6 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.

  • sigmoid: output shape matches input
  • sigmoid: matches the formula on a moderate-magnitude input
  • sigmoid: numerically stable at extreme inputs (no NaN, no inf)
  • sigmoid_backward: matches s * (1 - s) * grad_out
  • sigmoid_backward: matches finite-difference numerical gradient
  • sigmoid_backward: gradient saturates at extreme inputs (≈ 0)