AI From Scratch

Phase 11/17 lessons/~17 hours

LLM Engineering

Put LLMs to work in production applications.

0 / 17 complete0%

Lessons

01Prompt Engineering: Techniques & PatternsUp nextMost people write prompts like they are texting a friend. Then they wonder why a 200-billion parameter model gives mediocre answers. Prompt engineering is not about tricks. It is about understanding that every token you send is an instruct...Build/~90 minutes/Python 02Few-Shot, Chain-of-Thought, Tree-of-ThoughtTelling a model what to do is prompting. Showing it how to think is engineering. The gap between 78% and 91% accuracy on the same model, same task, same data is not a better model. It is a better reasoning strategy.Build/~45 minutes/Python 03Structured Outputs: JSON, Schema Validation, Constrained DecodingYour LLM returns a string. Your application needs JSON. That gap has crashed more production systems than any model hallucination. Structured output is the bridge between natural language and typed data. Get it right and your LLM becomes a...Build/~90 minutes/Python 04Embeddings & Vector RepresentationsText is discrete. Math is continuous. Every time you ask an LLM to find "similar" documents, compare meanings, or search beyond keywords, you're relying on a bridge between these two worlds. That bridge is an embedding. If you don't unders...Build/~75 minutes/Python 05Context Engineering: Windows, Budgets, Memory, and RetrievalPrompt engineering is a subset. Context engineering is the whole game. A prompt is a string you type. Context is everything that goes into the model's window: system instructions, retrieved documents, tool definitions, conversation history...Build/~90 minutes/Python 06RAG (Retrieval-Augmented Generation)Your LLM knows everything up to its training cutoff. It knows nothing about your company's docs, your codebase, or last week's meeting notes. RAG solves this by retrieving relevant documents and stuffing them into the prompt. It's the most...Build/~90 minutes/Python 07Advanced RAG (Chunking, Reranking, Hybrid Search)Basic RAG retrieves the top-k most similar chunks. That works for simple questions. It falls apart for multi-hop reasoning, ambiguous queries, and large corpora. Advanced RAG is the difference between a demo that works on 10 documents and...Build/~90 minutes/Python 08Fine-Tuning with LoRA & QLoRAFull fine-tuning a 7B model requires 56GB of VRAM. You don't have that. Neither do most companies. LoRA lets you fine-tune the same model in 6GB by training less than 1% of the parameters. This isn't a compromise -- it matches full fine-tu...Build/~75 minutes/Python 09Function Calling & Tool UseLLMs cannot do anything. They generate text. That is the entire capability. They cannot check the weather, query a database, send an email, run code, or read a file. Every "AI agent" you have ever seen is an LLM generating JSON that says w...Build/~75 minutes/Python 10Evaluation & Testing LLM ApplicationsYou would never deploy a web app without tests. You would never ship a database migration without a rollback plan. But right now, most teams ship LLM applications by reading 10 outputs and saying "yeah, looks good." That is not evaluation....Build/~45 minutes/Python 11Caching, Rate Limiting & Cost OptimizationMost AI startups do not die from bad models. They die from bad unit economics. A single GPT-4o call costs fractions of a cent. Ten thousand users making ten calls per day costs $250 in input tokens alone -- before you charge a single dolla...Build/~45 minutes/Python 12Guardrails, Safety & Content FilteringYour LLM application will be attacked. Not might. Will. The first prompt injection attempt against your production system will come within 48 hours of launch. The question is not whether someone will try "ignore previous instructions and r...Build/~45 minutes/Python 13Building a Production LLM ApplicationYou have built prompts, embeddings, RAG pipelines, function calling, caching layers, and guardrails. Separately. In isolation. Like practicing guitar scales without ever playing a song. This lesson is the song. You will wire every componen...Build (Capstone)/~120 minutes/Python 14Model Context Protocol (MCP)Every LLM app built before 2025 invented its own tool schema. Then Anthropic shipped MCP, Claude adopted it, OpenAI adopted it, and by 2026 it is the default wire format for connecting any LLM to any tool, data source, or agent. Write one...Build/~75 minutes/Python 15Prompt Caching and Context CachingYour system prompt is 4,000 tokens. Your RAG context is 20,000 tokens. You send both with every request. You also pay for both — every time. Prompt caching lets the provider keep that prefix warm on their side and bill you 10% of the norma...Build/~60 minutes/Python 16LangGraph — State Machines for AgentsA ReAct loop written by hand is a while True. A ReAct loop written in LangGraph is a graph you can checkpoint, interrupt, branch, and time-travel through. The agent hasn't changed. The harness around it has.Build/~75 minutes/Python 17Agent Framework Tradeoffs — LangGraph vs CrewAI vs AutoGen vs AgnoEvery framework sells the same demo (research agent builds a report) and hides the same bug (state schema fights with the orchestration layer). Pick the framework whose abstractions match the shape of your problem; everything else is glue...Learn/~45 minutes/Python