Phase 10: LLMs from Scratch
AI From Scratch/Lesson 04/~120 minutes

Pre-Training a Mini GPT (124M Parameters)

GPT-2 Small has 124 million parameters. That's 12 transformer layers, 12 attention heads, and 768-dimensional embeddings. You can train it from scratch on a single GPU in a few hours. Most people never do this. They use pre-trained checkpo...

BuildPython (with numpy)No prerequisites
Loading lesson page...