Prof by Lex AI Labs — home
Profby Lex AI
EARLY ACCESS

Shipping daily.

Last shippedMAY 20

New curriculum map — see how all 7 courses build on each other, search any lesson with /.

View what’s new→
COMING NEXT
  • Adaptive quizzes
  • Tutor v2.0
  • More company interview guides
  • Voting board for content
Request a feature or report a bug→
CurriculumPracticeFor OrganizationsPricing
Sign InStart Learning
← Practice

transformer

20 problems

  • Token embedding lookupEasy
  • Sinusoidal positional encodingEasy
  • Scaled dot-product self-attentionMedium
  • Causal mask: build + applyEasy
  • Multi-head split + combineMedium
  • Multi-Head Attention (full layer)Medium
  • LayerNorm forwardEasy
  • Transformer block forward (pre-LN, residual)Hard
  • Dynamic-Tanh (DyT)Medium
  • Efficient sparse window attentionMedium
  • FlashAttention tiled forwardHard
  • GELU forward (tanh approximation)Easy
  • Grouped-query attentionMedium
  • KV cache for autoregressive inferenceMedium
  • KV cache compression (MLA)Hard
  • Noisy top-k gatingMedium
  • RMSNormEasy
  • Rotary Position Embedding (RoPE)Hard
  • Sparse mixture-of-experts layerHard
  • SwiGLU activationMedium