Phase 17: Infrastructure & Production
AI From Scratch/Lesson 09/~75 minutes

Production Quantization — AWQ, GPTQ, GGUF K-quants, FP8, MXFP4/NVFP4

Quantization format is not a universal choice — it is a function of hardware, serving engine, and workload. GGUF Q4_K_M or Q5_K_M owns CPU and edge, delivered through llama.cpp and Ollama. GPTQ wins inside vLLM when you need multi-LoRA on...

LearnPython (stdlibtoy memory and throughput comparison across formats)
Loading lesson page...