Loading lesson page...
AI From Scratch/Lesson 09/~75 minutes
Production Quantization — AWQ, GPTQ, GGUF K-quants, FP8, MXFP4/NVFP4
Quantization format is not a universal choice — it is a function of hardware, serving engine, and workload. GGUF Q4_K_M or Q5_K_M owns CPU and edge, delivered through llama.cpp and Ollama. GPTQ wins inside vLLM when you need multi-LoRA on...
LearnPython (stdlibtoy memory and throughput comparison across formats)