SwiGLU activationMediumnumpyactivationtransformerllmglu
SwiGLU activation
Background
SwiGLU is the gated activation in the feed-forward block of modern LLMs (PaLM, LLaMA). A GLU (gated linear unit) splits its input in half and lets one half gate the other; SwiGLU uses the Swish gate . It consistently outperforms plain ReLU/GELU feed-forward layers at the same parameter budget.
Problem statement
Implement SwiGLU(x) for input of shape (batch, 2d). Split the last dimension into halves and return:
where is the sigmoid.
Input
x—np.ndarrayof shape(batch_size, 2d).
Output
Returns an np.ndarray of shape (batch_size, d).
Examples
Example 1
Input: [[1, -1, 1000, -1000]]
Output: [[1000.0, 0.0]]
Explanation: split into and . so ; so . Then .
Constraints
- The last dimension is even; split it into equal halves (first) and (second).
- Gate with ; output .
- The output has half the last-dimension size:
(batch, d).
Notes
- The GLU family lets the network learn which features to pass through — Swish is the smooth, non-monotonic gate that works best empirically.
- In a real FFN, and come from two separate linear projections of the input; here they are handed to you pre-projected as the two halves.
Python
Loading...
This problem ships 4 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.
- •Reference example
- •Output halves the last dimension
- •Matches x1 * (x2 * sigmoid(x2))
- •Finite for extreme inputs