Loading lesson page...
AI From Scratch/Lesson 28/~45 minutes
Self-Hosted Serving Selection — llama.cpp, Ollama, TGI, vLLM, SGLang
Four engines dominate self-hosted inference in 2026. Pick based on hardware, scale, and ecosystem. llama.cpp is fastest on CPU — widest model support, full control over quantization and threading. Ollama is the dev-laptop one-command insta...
Learn