RMSNormEasynumpynormalizationtransformerllmrmsnorm
RMSNorm
Background
RMSNorm (Root-Mean-Square Norm) is the simplified LayerNorm used in LLaMA, T5, and most recent LLMs. It rescales activations by their root-mean-square — dropping LayerNorm's mean-subtraction — which is cheaper and works just as well. Only a learnable per-feature gain is applied (no bias).
Problem statement
Implement rmsnorm(x, gamma, epsilon=1e-8), normalizing over the last (feature) axis:
Input
x—np.ndarrayof shape(..., D).gamma—np.ndarray(D,): per-feature gain.epsilon—float.
Output
Returns an np.ndarray of the same shape as x.
Examples
Example 1
Input: x = [[1, 2, 3, 4]], gamma = [1, 1, 1, 1]
Output: [[0.3651, 0.7303, 1.0954, 1.4606]]
Explanation: mean of squares , so RMS ; dividing each element by it (and ) gives the output.
Constraints
- Normalize over the last axis; use the mean of squares (divide by ), not the sum.
- No mean subtraction and no bias — only the per-feature gain .
- Tests compare with
atol=1e-4.
Notes
- Versus LayerNorm, RMSNorm skips centering (); empirically the re-centering contributes little, so dropping it saves compute.
- After RMSNorm each row has unit root-mean-square (when ), i.e. its L2 norm is .
Python
Loading...
This problem ships 4 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.
- •Reference example
- •Each row has unit RMS when gamma = 1
- •gamma scales per feature
- •Shape is preserved