Loading lesson page...
AI From Scratch/Lesson 34/~90 minutes
Transformer Block from Scratch
One block is the unit of every modern decoder LLM. Layer norm, multi head attention, residual, MLP, residual. The pre-LN variant trains stably without warmup. The post-LN variant is what the original paper shipped. This lesson builds both,...
BuildPython