Loading lesson page...
AI From Scratch/Lesson 04/~120 minutes
Pre-Training a Mini GPT (124M Parameters)
GPT-2 Small has 124 million parameters. That's 12 transformer layers, 12 attention heads, and 768-dimensional embeddings. You can train it from scratch on a single GPU in a few hours. Most people never do this. They use pre-trained checkpo...
BuildPython (with numpy)No prerequisites