Group normalization
Background
Group Normalization splits a layer's channels into groups and normalizes each group's activations per sample, then applies a per-channel affine. Unlike batch norm it doesn't depend on the batch dimension, so it works with tiny batches (detection, segmentation) where batch-norm statistics are unreliable.
Problem statement
Implement group_normalization(X, gamma, beta, num_groups, epsilon=1e-5) for X of shape (B, C, H, W). Split the channels into num_groups groups; for each (sample, group) normalize over the group's channels and spatial dimensions, then apply the per-channel scale/shift:
where are the mean/variance over each group's elements.
Input
X—np.ndarray(B, C, H, W).gamma,beta—np.ndarray(C,): per-channel scale and shift.num_groups—int: must divideC.epsilon—float.
Output
Returns an np.ndarray (B, C, H, W).
Examples
Example 1
Input: X = [[[[1, 2]], [[3, 4]]]] (shape 1x2x1x2), gamma = [1,1], beta = [0,0], num_groups = 1
Output: [[[[-1.3416, -0.4472]], [[0.4472, 1.3416]]]]
Explanation: with one group, all four values are normalized together (mean 2.5, std ), then the identity affine is applied.
Constraints
- Reshape to
(B, num_groups, group_size, H, W)and reduce mean/var over axes(2, 3, 4). - Apply per-channel
gamma/beta(broadcast as(1, C, 1, 1)) after reshaping back. num_groupsdividesC; tests compare withatol=1e-4.
Notes
num_groups = 1is Layer Norm (all channels together);num_groups = Cis Instance Norm (each channel alone) — GroupNorm interpolates between them.- Statistics are computed per sample, so GroupNorm behaves identically at train and test time, unlike BatchNorm.
This problem ships 4 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.
- •Reference example
- •Each group is normalized to ~0 mean and ~1 var
- •Output shape matches input
- •Affine: gamma scales and beta shifts