Token embedding lookupEasy

Token embedding lookup

Background

Token embedding is the very first thing a transformer does: it turns a sequence of integer token ids into a sequence of learnable vectors. The embedding table is one big matrix with one row per vocabulary entry — and it is huge, about a third of GPT-2's 124M parameters live in this single table. The operation itself is the simplest in the whole model: a row lookup. The point of this problem is to recognise that you do not need a Python loop over the batch — NumPy fancy indexing does it in one operation.

Problem statement

Implement token_embedding(idx, weight): for every token id in idx, return the corresponding row of the embedding table. With idx{0,,V1}B×T\text{idx} \in \{0, \dots, V-1\}^{B\times T} and weightRV×C\text{weight} \in \mathbb{R}^{V\times C}:

out[b,t,:]=weight[idx[b,t],  :],outRB×T×C\text{out}[b, t, :] = \text{weight}\big[\,\text{idx}[b, t]\,,\; :\,\big], \qquad \text{out} \in \mathbb{R}^{B\times T\times C}

In NumPy this is exactly weight[idx] — fancy indexing broadcasts the (B, T) index array to produce a (B, T, C) result.

Input

  • idxnp.ndarray of shape (B, T), integer token ids, each in [0,V)[0, V).
  • weightnp.ndarray of shape (V, C): the embedding table; row ii is the vector for token id ii.

Output

Returns an np.ndarray of shape (B, T, C) — the embedding vector for each token in idx.

Examples

Example 1 — identity table makes the lookup obvious

Input:  weight = np.eye(5), idx = [[3, 0, 1]]
Output: [[[0, 0, 0, 1, 0],
          [1, 0, 0, 0, 0],
          [0, 1, 0, 0, 0]]]        # shape (1, 3, 5)

Explanation: with the identity table, row ii is the ii-th basis vector. Token 3 picks row 3 → [0,0,0,1,0], token 0 picks row 0, token 1 picks row 1.

Example 2 — the same id always maps to the same vector

Input:  weight: a (5, 16) table, idx = [[2, 0, 2, 1, 2]]
Output: shape (1, 5, 16);  out[0,0] == out[0,2] == out[0,4] == weight[2]

Explanation: token id 2 appears at positions 0, 2, and 4; each looks up the same row weight[2], so all three vectors are identical. A correct lookup guarantees this for free.

Constraints

  • Indices are integers in [0,V)[0, V); the output is float, of shape (B, T, C).
  • Use vectorised fancy indexing (weight[idx]) — a Python for loop over (B, T) is O(BT)O(B\cdot T) interpreter overhead for what is a single C-level operation.
  • A repeated token id must yield the identical row at every occurrence (atol=1e-12).
  • Must scale to GPT-2 sizes (e.g. V=50257V = 50257, C=64C = 64) without materialising anything per-token.

Notes

  • Equivalent but wasteful. A lookup equals a one-hot matmul, np.eye(V)[idx] @ weight, but that builds a (B, T, V) tensor — fine for V=10V=10, fatal for V=50257V=50257. The lookup skips it entirely.
  • Series. This is step 1 of the build-gpt track; later steps add positional encodings, attention, layer norm, and the full transformer block.
Python
Loading...

This problem ships 5 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.

  • Output shape is (B, T, C)
  • Result matches direct row indexing weight[idx]
  • Same id at different positions returns identical vectors
  • Single batch (B=1) works
  • GPT-2-scale shapes work (vocab=50257, n_embd=64, B=2, T=8)