Phase 17: Infrastructure & Production
AI From Scratch/Lesson 07/~75 minutes

TensorRT-LLM on Blackwell with FP8 and NVFP4

TensorRT-LLM is NVIDIA-only but it wins on Blackwell. On GB200 NVL72 with Dynamo orchestration, SemiAnalysis InferenceX measured $0.012 per million tokens on a 120B model in Q1-Q2 2026, against $0.09/M on H100 + vLLM — a 7x economic gap. T...

LearnPython (stdlibtoy FP8NVFP4 memory and cost calculator)
Loading lesson page...