Phase 19: Capstone Projects
AI From Scratch/Lesson 34/~90 minutes

Transformer Block from Scratch

One block is the unit of every modern decoder LLM. Layer norm, multi head attention, residual, MLP, residual. The pre-LN variant trains stably without warmup. The post-LN variant is what the original paper shipped. This lesson builds both,...

BuildPython
Loading lesson page...