Phase 07: Transformers Deep Dive
AI From Scratch/Lesson 13/~45 minutes

Scaling Laws

The 2020 Kaplan paper said: bigger model, lower loss. The 2022 Hoffmann paper said: you were under-training. Compute goes into two buckets — parameters and tokens — and the split is not obvious.

LearnPython
Loading lesson page...