Sequence padding & masking
Background
Models that train in batches need every sequence in a batch to be the same length, but real text/event sequences vary. The fix is padding: extend short sequences with a dummy value up to a common length (and truncate the over-long ones). A parallel mask marks which positions are real (1) versus padding (0) so downstream layers — losses, RNN steps, attention — can ignore the filler.
Problem statement
Implement pad_and_mask(sequences, max_len=None, pad_value=0) that post-pads/truncates a list of integer sequences to a common length and returns (padded, mask):
- If
max_lenisNone, use the length of the longest sequence. - Sequences longer than
max_lenare truncated from the end (keep the firstmax_lentokens). - Padding is appended at the end (post-padding) with
pad_value. mask[i, t] = 1if positiontis a real token of sequencei, else0.
Input
sequences— list of lists/arrays ofinttokens (variable length).max_len—intorNone; the target length.pad_value—intused to fill padded positions.
Output
A tuple (padded, mask):
padded—np.ndarrayof shape(num_sequences, max_len).mask—np.ndarrayof shape(num_sequences, max_len)with values in{0, 1}.
Examples
Example 1
Input: sequences = [[1, 2, 3], [4, 5], [6]], max_len = None
Output: padded = [[1, 2, 3],
[4, 5, 0],
[6, 0, 0]]
mask = [[1, 1, 1],
[1, 1, 0],
[1, 0, 0]]
Explanation: the longest sequence has length 3, so all rows are padded to 3 with pad_value=0; the mask flags the two padded cells in rows 2 and 3.
Constraints
- Post-padding and post-truncating (operate on the tail).
- The mask must be 1 exactly at the
min(len(seq), max_len)leading positions of each row. - Output arrays both have shape
(num_sequences, max_len).
Notes
- The mask later lets attention set padded positions to before softmax (so they get zero weight), and lets losses skip padded targets.
- Some pipelines pre-pad instead (filler at the front) — common for RNNs so the last real token sits at the final time step.
This problem ships 5 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.
- •Reference example
- •Explicit max_len truncates the tail
- •Custom pad_value is used for filler
- •Mask row sums equal clamped original lengths
- •Output shapes are (num_sequences, max_len)