Loading lesson page...
AI From Scratch/Lesson 13/~45 minutes
Scaling Laws
The 2020 Kaplan paper said: bigger model, lower loss. The 2022 Hoffmann paper said: you were under-training. Compute goes into two buckets — parameters and tokens — and the split is not obvious.
LearnPython