RMSNormEasy

RMSNorm

Background

RMSNorm (Root-Mean-Square Norm) is the simplified LayerNorm used in LLaMA, T5, and most recent LLMs. It rescales activations by their root-mean-square — dropping LayerNorm's mean-subtraction — which is cheaper and works just as well. Only a learnable per-feature gain γ\gamma is applied (no bias).

Problem statement

Implement rmsnorm(x, gamma, epsilon=1e-8), normalizing over the last (feature) axis:

RMS(x)=1Di=1Dxi2+ϵ,y=xRMS(x)γ\text{RMS}(x) = \sqrt{\frac{1}{D}\sum_{i=1}^{D} x_i^2 + \epsilon}, \qquad y = \frac{x}{\text{RMS}(x)} \odot \gamma

Input

  • xnp.ndarray of shape (..., D).
  • gammanp.ndarray (D,): per-feature gain.
  • epsilonfloat.

Output

Returns an np.ndarray of the same shape as x.

Examples

Example 1

Input:  x = [[1, 2, 3, 4]], gamma = [1, 1, 1, 1]
Output: [[0.3651, 0.7303, 1.0954, 1.4606]]

Explanation: mean of squares =(1+4+9+16)/4=7.5=(1+4+9+16)/4=7.5, so RMS =7.52.7386=\sqrt{7.5}\approx2.7386; dividing each element by it (and ×1\times 1) gives the output.

Constraints

  • Normalize over the last axis; use the mean of squares (divide by DD), not the sum.
  • No mean subtraction and no bias — only the per-feature gain γ\gamma.
  • Tests compare with atol=1e-4.

Notes

  • Versus LayerNorm, RMSNorm skips centering (xμx-\mu); empirically the re-centering contributes little, so dropping it saves compute.
  • After RMSNorm each row has unit root-mean-square (when γ=1\gamma=1), i.e. its L2 norm is D\sqrt{D}.
Python
Loading...

This problem ships 4 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.

  • Reference example
  • Each row has unit RMS when gamma = 1
  • gamma scales per feature
  • Shape is preserved