Efficient sparse window attention
Background
Dense attention costs in sequence length — prohibitive for long contexts. Sliding-window (local) attention restricts each query to a fixed window of nearby keys, cutting cost to . It is the backbone of long-context models like Longformer and Mistral: most relevant context is local, and stacking windowed layers still grows the effective receptive field.
Problem statement
Implement sparse_window_attention(Q, K, V, window_size, scale_factor=None). For each query position , attend only to keys in the window (clipped to the sequence):
where is window_size and is scale_factor (defaults to ). Use the max-subtraction trick for a stable softmax.
Input
Q,K—np.ndarrayof shape(seq_len, d_k).V—np.ndarrayof shape(seq_len, d_v).window_size—int, the window radius (positions attended on each side).scale_factor—floatorNone; ifNone, use .
Output
An np.ndarray of shape (seq_len, d_v): the windowed attention output.
Examples
Example 1
Input: Q = [[1],[1],[1]], K = [[1],[1],[1]], V = [[1],[2],[3]], window_size = 1
Output: [[1.5], [2.0], [2.5]]
Explanation: position 0 sees V[0:2] → mean 1.5; position 1 sees all three → mean 2.0; position 2 sees V[1:3] → mean 2.5. All scores are equal here, so each window is a simple average.
Constraints
- The window is inclusive on both sides:
start = max(0, i - w),end = min(seq_len, i + w + 1). - Softmax is computed only over the window (with max-subtraction), then applied to the window's values.
scale_factordefaults to , matching standard scaled dot-product attention.
Notes
- A window of radius recovers full dense attention (every query sees every key).
- A window of radius 0 makes each token attend only to itself, so the output equals
V.
This problem ships 5 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.
- •Reference example
- •Window radius >= seq_len recovers dense attention
- •Window radius 0 makes each token attend to itself (output == V)
- •Output shape is (seq_len, d_v)
- •A query only sees keys inside its window