Swish / SiLU activationEasy

Swish / SiLU activation

Background

Swish (a.k.a. SiLU, xσ(x)x\cdot\sigma(x)) is a smooth, non-monotonic activation found by neural-architecture search; it slightly outperforms ReLU in deep networks and is the gate inside SwiGLU. Unlike ReLU it has a small negative response and a continuous derivative everywhere.

Problem statement

Implement swish(x, beta=1.0):

swish(x)=xσ(βx)=x1+eβx\text{swish}(x) = x\,\sigma(\beta x) = \frac{x}{1 + e^{-\beta x}}

Input

  • xnp.ndarray: input (any shape).
  • betafloat: gate sharpness (default 1.0 = SiLU).

Output

Returns an np.ndarray of the same shape.

Examples

Example 1

Input:  x = [-1, 0, 1, 2], beta = 1.0
Output: [-0.2689, 0.0, 0.7311, 1.7616]

Explanation: at x=0x=0, swish =0=0; at x=1x=1, 1σ(1)=0.73111\cdot\sigma(1)=0.7311; the negative input gives a small negative output, 1σ(1)=0.2689-1\cdot\sigma(-1)=-0.2689.

Constraints

  • swish =xσ(βx)= x\,\sigma(\beta x), applied elementwise.
  • β0\beta\to 0 approaches a linear x/2x/2; large β\beta approaches ReLU.
  • Tests compare with atol=1e-4.

Notes

  • swish(0)=0(0)=0, and it is unbounded above but bounded below (a small negative dip near x1.28x\approx-1.28), which helps gradient flow compared with ReLU's hard zero.
  • SiLU is the β=1\beta=1 special case used in EfficientNet and as the SwiGLU gate.
Python
Loading...

This problem ships 4 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.

  • Reference values
  • swish(0) = 0
  • Matches x * sigmoid(beta*x)
  • Large beta approaches ReLU on positives