Phase 05: NLP — Foundations to Advanced
AI From Scratch/Lesson 19/~60 minutes

Subword Tokenization — BPE, WordPiece, Unigram, SentencePiece

Word tokenizers choke on unseen words. Character tokenizers blow up sequence length. Subword tokenizers split the difference. Every modern LLM ships on one.

LearnPython
Loading lesson page...