Loading lesson page...
AI From Scratch/Lesson 19/~60 minutes
Subword Tokenization — BPE, WordPiece, Unigram, SentencePiece
Word tokenizers choke on unseen words. Character tokenizers blow up sequence length. Subword tokenizers split the difference. Every modern LLM ships on one.
LearnPython