Search

Find the next
useful step.

Search across AIByDM learning tracks, lessons, projects, tools, practice games, exam prep, and newsletter issues.

692

Indexed items

Static

Fast

Ctrl K

Shortcut

692 results

trackAI From ScratchA complete AI engineering curriculum organized into phases, from setup and math foundations through LLMs, agents, infrastructure, safety, and capstone projects.20 phases / 503 lessons phasePhase 00: Setup & ToolingGet your environment ready for everything that follows.12 lessons / ~14 hours phasePhase 01: Math FoundationsThe intuition behind every AI algorithm, through code — not textbooks.22 lessons / ~23 hours phasePhase 02: ML FundamentalsClassical machine learning — still the backbone of most production AI.18 lessons / ~21 hours phasePhase 03: Deep Learning CoreNeural networks from first principles. No frameworks until you build one yourself.13 lessons / ~15 hours phasePhase 04: Computer VisionFrom pixels to understanding — image, video, and 3D.28 lessons / ~27 hours phasePhase 05: NLP — Foundations to AdvancedLanguage is the interface to intelligence. Master every layer.29 lessons / ~30 hours phasePhase 06: Speech & AudioThe other half of human communication. Hear, understand, speak.17 lessons / ~18 hours phasePhase 07: Transformers Deep DiveThe architecture that changed everything. Understand every layer.16 lessons / ~14 hours phasePhase 08: Generative AICreate images, video, audio, 3D, and more.15 lessons / ~14 hours phasePhase 09: Reinforcement LearningAgents that learn by doing. The foundation of RLHF.12 lessons / ~13 hours phasePhase 10: LLMs from ScratchBuild, train, and understand large language models.24 lessons / ~26 hours phasePhase 11: LLM EngineeringPut LLMs to work in production applications.17 lessons / ~17 hours phasePhase 12: Multimodal AIModels that see, hear, read, and reason across modalities.25 lessons / ~65 hours phasePhase 13: Tools & ProtocolsThe interfaces between AI and the real world.23 lessons / ~24.5 hours phasePhase 14: Agent EngineeringThe core of modern AI engineering. Build agents from first principles.42 lessons / ~42 hours phasePhase 15: Autonomous SystemsAgents that run without human intervention — safely.22 lessons / ~20 hours phasePhase 16: Multi-Agent & SwarmsCoordination, emergence, and collective intelligence.25 lessons / ~28 hours phasePhase 17: Infrastructure & ProductionShip AI to the real world. Scale, monitor, optimize.28 lessons / ~32 hours phasePhase 18: Ethics, Safety & AlignmentBuild AI that helps humanity. Not optional.30 lessons / ~31 hours phasePhase 19: Capstone ProjectsProve everything you learned. Build portfolio-grade systems.85 lessons / ~620 hours lessonDev EnvironmentYour tools shape your thinking. Set them up once, set them up right.Phase 00: Setup & Tooling / ~45 minutes lessonGit & CollaborationVersion control is not optional. Every experiment, every model, every lesson you build here gets tracked.Phase 00: Setup & Tooling / ~30 minutes lessonGPU Setup & CloudTraining on CPU is fine for learning. Training for real needs a GPU.Phase 00: Setup & Tooling / ~45 minutes lessonAPIs & KeysEvery AI API works the same way: send a request, get a response. The details change, the pattern doesn't.Phase 00: Setup & Tooling / ~30 minutes lessonJupyter NotebooksNotebooks are the lab bench of AI engineering. You prototype here, then move what works into production.Phase 00: Setup & Tooling / ~30 minutes lessonPython EnvironmentsDependency hell is real. Virtual environments are the cure.Phase 00: Setup & Tooling / ~30 minutes lessonDocker for AIContainers make "works on my machine" a thing of the past.Phase 00: Setup & Tooling / ~60 minutes lessonEditor SetupYour editor is your co-pilot. Configure it once so it stays out of your way and starts pulling its weight.Phase 00: Setup & Tooling / ~20 minutes lessonData ManagementData is the fuel. How you manage it determines how fast you go.Phase 00: Setup & Tooling / ~45 minutes lessonTerminal & ShellThe terminal is where AI engineers live. Get comfortable here.Phase 00: Setup & Tooling / ~35 minutes lessonLinux for AIMost AI runs on Linux. You need to know enough to not be stuck.Phase 00: Setup & Tooling / ~30 minutes lessonDebugging and ProfilingThe worst AI bugs don't crash. They train silently on garbage and report a beautiful loss curve.Phase 00: Setup & Tooling / ~60 minutes lessonLinear Algebra IntuitionEvery AI model is just matrix math wearing a fancy hat.Phase 01: Math Foundations / ~60 minutes lessonVectors, Matrices & OperationsEvery neural network is just matrix multiplication with extra steps.Phase 01: Math Foundations / ~60 minutes lessonMatrix TransformationsA matrix is a machine that reshapes space. Learn what it does to every point, and you understand the whole transformation.Phase 01: Math Foundations / ~75 minutes lessonCalculus for Machine LearningDerivatives tell you which way is downhill. That is all a neural network needs to learn.Phase 01: Math Foundations / ~60 minutes lessonChain Rule & Automatic DifferentiationThe chain rule is the engine behind every neural network that learns.Phase 01: Math Foundations / ~90 minutes lessonProbability and DistributionsProbability is the language AI uses to express uncertainty.Phase 01: Math Foundations / ~75 minutes lessonBayes' TheoremProbability is about what you expect. Bayes' theorem is about what you learn.Phase 01: Math Foundations / ~75 minutes lessonOptimizationTraining a neural network is nothing more than finding the bottom of a valley.Phase 01: Math Foundations / ~75 minutes lessonInformation TheoryInformation theory measures surprise. Loss functions are built on it.Phase 01: Math Foundations / ~60 minutes lessonDimensionality ReductionHigh-dimensional data has structure. You find it by looking from the right angle.Phase 01: Math Foundations / ~90 minutes lessonSingular Value DecompositionSVD is the Swiss Army knife of linear algebra. Every matrix has one. Every data scientist needs one.Phase 01: Math Foundations / ~120 minutes lessonTensor OperationsTensors are the common language between data and deep learning. Every image, every sentence, every gradient flows through them.Phase 01: Math Foundations / ~90 minutes lessonNumerical StabilityFloating point is a leaky abstraction. It will bite you during training, and you will not see it coming.Phase 01: Math Foundations / ~120 minutes lessonNorms and DistancesYour distance function defines what "similar" means. Choose wrong and everything downstream breaks.Phase 01: Math Foundations / ~90 minutes lessonStatistics for Machine LearningStatistics is how you know if your model actually works or just got lucky.Phase 01: Math Foundations / ~120 minutes lessonSampling MethodsSampling is how AI explores the space of possibilities.Phase 01: Math Foundations / ~120 minutes lessonLinear SystemsSolving Ax = b is the oldest problem in mathematics that still runs your neural network.Phase 01: Math Foundations / ~120 minutes lessonConvex OptimizationConvex problems have one valley. Neural networks have millions. Knowing the difference matters.Phase 01: Math Foundations / ~90 minutes lessonComplex Numbers for AIThe square root of -1 is not imaginary. It is the key to rotations, frequencies, and half of signal processing.Phase 01: Math Foundations / ~60 minutes lessonThe Fourier TransformEvery signal is a sum of sine waves. The Fourier transform tells you which ones.Phase 01: Math Foundations / ~90 minutes lessonGraph Theory for Machine LearningGraphs are the data structure of relationships. If your data has connections, you need graph theory.Phase 01: Math Foundations / ~90 minutes lessonStochastic ProcessesRandomness with structure. The math behind random walks, Markov chains, and diffusion models.Phase 01: Math Foundations / ~75 minutes lessonWhat Is Machine LearningMachine learning is teaching computers to find patterns in data instead of writing rules by hand.Phase 02: ML Fundamentals / ~45 minutes lessonLinear RegressionLinear regression draws the best straight line through your data. It is the "hello world" of machine learning.Phase 02: ML Fundamentals / ~90 minutes lessonLogistic RegressionLogistic regression bends a straight line into an S-curve to answer yes-or-no questions with probabilities.Phase 02: ML Fundamentals / ~90 minutes lessonDecision Trees and Random ForestsA decision tree is just a flowchart. But a forest of them is one of the most powerful tools in ML.Phase 02: ML Fundamentals / ~90 minutes lessonSupport Vector MachinesFind the widest street between two classes. That is the entire idea.Phase 02: ML Fundamentals / ~90 minutes lessonK-Nearest Neighbors and DistancesStore everything. Predict by looking at your neighbors. The simplest algorithm that actually works.Phase 02: ML Fundamentals / ~90 minutes lessonUnsupervised LearningNo labels, no teacher. The algorithm finds structure on its own.Phase 02: ML Fundamentals / ~90 minutes lessonFeature Engineering & SelectionA good feature is worth a thousand data points.Phase 02: ML Fundamentals / ~90 minutes lessonModel EvaluationA model is only as good as the way you measure it.Phase 02: ML Fundamentals / ~90 minutes lessonBias-Variance TradeoffEvery model error comes from one of three sources: bias, variance, or noise. You can only control the first two.Phase 02: ML Fundamentals / ~75 minutes lessonEnsemble MethodsA group of weak learners, combined correctly, becomes a strong learner. This is not a metaphor. It is a theorem.Phase 02: ML Fundamentals / ~120 minutes lessonHyperparameter TuningHyperparameters are the knobs you turn before training starts. Turning them well is the difference between a mediocre model and a great one.Phase 02: ML Fundamentals / ~90 minutes lessonML PipelinesA model is not a product. A pipeline is. The pipeline is everything from raw data to deployed prediction, and every step must be reproducible.Phase 02: ML Fundamentals / ~120 minutes lessonNaive BayesThe "naive" assumption is wrong, and it works anyway. That's the beauty of it.Phase 02: ML Fundamentals / ~75 minutes lessonTime Series FundamentalsPast performance does predict future results -- if you check for stationarity first.Phase 02: ML Fundamentals / ~90 minutes lessonAnomaly DetectionNormal is easy to define. Abnormal is whatever doesn't fit.Phase 02: ML Fundamentals / ~75 minutes lessonHandling Imbalanced DataWhen 99% of your data is "normal," accuracy is a lie.Phase 02: ML Fundamentals / ~90 minutes lessonFeature SelectionMore features is not better. The right features is better.Phase 02: ML Fundamentals / ~75 minutes lessonThe PerceptronThe perceptron is the atom of neural networks. Split it open and you find weights, a bias, and a decision.Phase 03: Deep Learning Core / ~60 minutes lessonMulti-Layer Networks and Forward PassOne neuron draws a line. Stack them, and you can draw anything.Phase 03: Deep Learning Core / ~90 minutes lessonBackpropagation from ScratchBackpropagation is the algorithm that makes learning possible. Without it, neural networks are just expensive random number generators.Phase 03: Deep Learning Core / ~120 minutes lessonActivation FunctionsWithout nonlinearity, your 100-layer network is a fancy matrix multiply. Activations are the gates that let neural networks think in curves.Phase 03: Deep Learning Core / ~75 minutes lessonLoss FunctionsYour network makes a prediction. The ground truth says otherwise. How wrong is it? That number is the loss. Pick the wrong loss function and your model optimizes for the wrong thing entirely.Phase 03: Deep Learning Core / ~75 minutes lessonOptimizersGradient descent tells you which direction to move. It says nothing about how far or how fast. SGD is a compass. Adam is GPS with traffic data.Phase 03: Deep Learning Core / ~75 minutes lessonRegularizationYour model gets 99% on training data and 60% on test data. It memorized instead of learning. Regularization is the tax you impose on complexity to force generalization.Phase 03: Deep Learning Core / ~75 minutes lessonWeight Initialization and Training StabilityInitialize wrong and training never starts. Initialize right and 50 layers train as smoothly as 3.Phase 03: Deep Learning Core / ~90 minutes lessonLearning Rate Schedules and WarmupThe learning rate is the single most important hyperparameter. Not the architecture. Not the dataset size. Not the activation function. The learning rate. If you tune nothing else, tune this.Phase 03: Deep Learning Core / ~90 minutes lessonBuild Your Own Mini FrameworkYou have built neurons, layers, networks, backprop, activations, loss functions, optimizers, regularization, initialization, and LR schedules. All as separate pieces. Now wire them together into a framework. Not PyTorch. Not TensorFlow. Yo...Phase 03: Deep Learning Core / ~120 minutes lessonIntroduction to PyTorchYou built the engine from pistons and crankshafts. Now learn the one everyone actually drives.Phase 03: Deep Learning Core / ~75 minutes lessonIntroduction to JAXPyTorch mutates tensors. TensorFlow builds graphs. JAX compiles pure functions. That last one changes how you think about deep learning.Phase 03: Deep Learning Core / ~90 minutes lessonDebugging Neural NetworksYour network compiled. It ran. It produced a number. The number is wrong and nothing crashed. Welcome to the hardest kind of debugging -- the kind where there is no error message.Phase 03: Deep Learning Core / ~90 minutes lessonImage Fundamentals — Pixels, Channels, Color SpacesAn image is a tensor of light samples. Every vision model you will ever use starts from this one fact.Phase 04: Computer Vision / ~45 minutes lessonConvolutions from ScratchA convolution is a tiny dense layer you slide across an image, sharing the same weights at every location.Phase 04: Computer Vision / ~75 minutes lessonCNNs — LeNet to ResNetEvery major CNN of the last thirty years is the same conv–nonlinearity–downsample recipe with one new idea bolted on. Learn the ideas in order.Phase 04: Computer Vision / ~75 minutes lessonImage ClassificationA classifier is a function from pixels to a probability distribution over classes. Everything else is plumbing.Phase 04: Computer Vision / ~75 minutes lessonTransfer Learning & Fine-TuningSomebody else spent a million GPU hours teaching a network what edges, textures, and object parts look like. You should borrow those features before training your own.Phase 04: Computer Vision / ~75 minutes lessonObject Detection — YOLO from ScratchDetection is classification plus regression, run at every position in a feature map, then cleaned up with non-maximum suppression.Phase 04: Computer Vision / ~75 minutes lessonSemantic Segmentation — U-NetSegmentation is classification at every pixel. U-Net makes it work by pairing a downsampling encoder with an upsampling decoder and wiring skip connections between them.Phase 04: Computer Vision / ~75 minutes lessonInstance Segmentation — Mask R-CNNAdd a tiny mask branch to a Faster R-CNN detector and you have instance segmentation. The hard part is RoIAlign, and it is harder than it looks.Phase 04: Computer Vision / ~75 minutes lessonImage Generation — GANsA GAN is two neural networks in a fixed game. One draws, one critiques. They get better together until the drawings fool the critic.Phase 04: Computer Vision / ~75 minutes lessonImage Generation — Diffusion ModelsA diffusion model learns to denoise. Train it to remove a tiny bit of noise from a noisy image, repeat that backwards a thousand times, and you have an image generator.Phase 04: Computer Vision / ~75 minutes lessonStable Diffusion — Architecture & Fine-TuningStable Diffusion is a DDPM that runs in the latent space of a pretrained VAE, conditioned on text via cross-attention, sampled with a fast deterministic ODE solver, and steered by classifier-free guidance.Phase 04: Computer Vision / ~75 minutes lessonVideo Understanding — Temporal ModelingA video is a sequence of images plus the physics that connects them. Every video model either treats time as an extra axis (3D conv), a sequence to attend over (transformer), or a feature to extract once and pool (2D+pool).Phase 04: Computer Vision / ~45 minutes lesson3D Vision — Point Clouds & NeRFs3D vision comes in two flavours. Point clouds are the sensor's raw output. NeRFs are the learned volumetric field. Both answer "what is where in space."Phase 04: Computer Vision / ~45 minutes lessonVision Transformers (ViT)Cut the image into patches, treat each patch as a word, run a standard transformer. Don't look back.Phase 04: Computer Vision / ~45 minutes lessonReal-Time Vision — Edge DeploymentEdge inference is the discipline of getting a 90-accuracy model to run at 30 fps on a device with 2 GB of RAM. Every percentage point of accuracy is traded against milliseconds of latency.Phase 04: Computer Vision / ~75 minutes lessonBuild a Complete Vision Pipeline — CapstoneA production vision system is a chain of models and rules stitched with data contracts. The pieces are already in this phase; the capstone wires them together end-to-end.Phase 04: Computer Vision / ~120 minutes lessonSelf-Supervised Vision — SimCLR, DINO, MAELabels are the bottleneck of supervised vision. Self-supervised pretraining removes them: learn visual features from 100M unlabelled images, fine-tune on 10k labelled ones.Phase 04: Computer Vision / ~75 minutes lessonOpen-Vocabulary Vision — CLIPTrain an image encoder and a text encoder together so that matching (image, caption) pairs land at the same point in a shared space. That is the whole trick.Phase 04: Computer Vision / ~45 minutes lessonOCR & Document UnderstandingOCR is a three-stage pipeline — detect text boxes, recognise the characters, then lay them out. Every modern OCR system reorders these stages or merges them.Phase 04: Computer Vision / ~45 minutes lessonImage Retrieval & Metric LearningA retrieval system ranks candidates by a distance in embedding space. Metric learning is the discipline of shaping that space so the distances mean what you want.Phase 04: Computer Vision / ~45 minutes lessonKeypoint Detection & Pose EstimationA pose is a set of ordered keypoints. A keypoint detector is a heatmap regressor. Everything else is bookkeeping.Phase 04: Computer Vision / ~45 minutes lesson3D Gaussian Splatting from ScratchA scene is a cloud of millions of 3D Gaussians. Each one has a position, orientation, scale, opacity, and a colour that depends on viewing direction. Rasterise them, backprop through the rasterisation, done.Phase 04: Computer Vision / ~90 minutes lessonDiffusion Transformers & Rectified FlowThe U-Net is not the secret of diffusion. Replace it with a transformer, swap the noise schedule for a straight-line flow, and suddenly you have SD3, FLUX, and every 2026 text-to-image model.Phase 04: Computer Vision / ~75 minutes lessonSAM 3 & Open-Vocabulary SegmentationGive a model a text prompt and an image and get masks for every matching object. SAM 3 made that a single forward pass.Phase 04: Computer Vision / ~60 minutes lessonVision-Language Models — The ViT-MLP-LLM PatternA vision encoder converts an image into tokens. An MLP projector maps those tokens into the LLM's embedding space. A language model does the rest. That pattern — ViT-MLP-LLM — is every production VLM in 2026.Phase 04: Computer Vision / ~75 minutes lessonMonocular Depth & Geometry EstimationA depth map is a single-channel image where each pixel is a distance from the camera. Predicting it from one RGB frame used to be impossible without stereo or LiDAR. In 2026 a frozen ViT encoder plus a lightweight head gets within a few pe...Phase 04: Computer Vision / ~60 minutes lessonMulti-Object Tracking & Video MemoryTracking is detection plus association. Detect every frame. Match this frame's detections to last frame's tracks by ID.Phase 04: Computer Vision / ~60 minutes lessonWorld Models & Video DiffusionA video model that predicts the next seconds of a scene is a world simulator. Condition that prediction on actions and you have a learned game engine.Phase 04: Computer Vision / ~75 minutes lessonText Processing — Tokenization, Stemming, LemmatizationLanguage is continuous. Models are discrete. Preprocessing is the bridge.Phase 05: NLP — Foundations to Advanced / ~45 minutes lessonBag of Words, TF-IDF, and Text RepresentationCount first, think later. TF-IDF still beats embeddings on well-defined tasks in 2026.Phase 05: NLP — Foundations to Advanced / ~75 minutes lessonWord Embeddings — Word2Vec from ScratchA word is the company it keeps. Train a shallow net on that idea and geometry falls out.Phase 05: NLP — Foundations to Advanced / ~75 minutes lessonGloVe, FastText, and Subword EmbeddingsWord2Vec trained one embedding per word. GloVe factorized the co-occurrence matrix. FastText embedded the pieces. BPE bridged to transformers.Phase 05: NLP — Foundations to Advanced / ~45 minutes lessonSentiment AnalysisThe canonical NLP task. Most of what you need to know about classical text classification shows up here.Phase 05: NLP — Foundations to Advanced / ~75 minutes lessonNamed Entity RecognitionPull the names out. Sounds easy until you deal with ambiguous boundaries, nested entities, and domain jargon.Phase 05: NLP — Foundations to Advanced / ~75 minutes lessonPOS Tagging and Syntactic ParsingGrammar was unfashionable for a while. Then every LLM pipeline needed to validate structured extraction, and it came back.Phase 05: NLP — Foundations to Advanced / ~45 minutes lessonCNNs and RNNs for TextConvolutions learn n-grams. Recurrences remember. Both are superseded by attention. Both still matter on constrained hardware.Phase 05: NLP — Foundations to Advanced / ~75 minutes lessonSequence-to-Sequence ModelsTwo RNNs pretending to be a translator. The bottleneck they hit is the reason attention exists.Phase 05: NLP — Foundations to Advanced / ~75 minutes lessonAttention Mechanism — The BreakthroughThe decoder stops squinting at a compressed summary and starts looking at the whole source. Everything after this is attention plus engineering.Phase 05: NLP — Foundations to Advanced / ~45 minutes lessonMachine TranslationTranslation is the task that paid for NLP research for thirty years and keeps paying now.Phase 05: NLP — Foundations to Advanced / ~75 minutes lessonText SummarizationExtractive systems tell you what the document said. Abstractive systems tell you what the author meant. Different tasks, different pitfalls.Phase 05: NLP — Foundations to Advanced / ~75 minutes lessonQuestion Answering SystemsThree systems shaped modern QA. Extractive found spans. Retrieval-augmented grounded them in documents. Generative produced answers. Every modern AI assistant is a mix of the three.Phase 05: NLP — Foundations to Advanced / ~75 minutes lessonInformation Retrieval and SearchBM25 is precise but brittle. Dense casts a wide net but misses keywords. Hybrid is the 2026 default. Everything else is tuning.Phase 05: NLP — Foundations to Advanced / ~75 minutes lessonTopic Modeling — LDA and BERTopicLDA: documents are mixtures of topics, topics are distributions over words. BERTopic: documents cluster in embedding space, clusters are topics. Same goal, different decompositions.Phase 05: NLP — Foundations to Advanced / ~45 minutes lessonText Generation Before Transformers — N-gram Language ModelsIf a word is surprising, the model is bad. Perplexity makes surprise a number. Smoothing keeps it finite.Phase 05: NLP — Foundations to Advanced / ~45 minutes lessonChatbots — Rule-Based to Neural to LLM AgentsELIZA replied with pattern matches. DialogFlow mapped intents. GPT answered from weights. Claude runs tools and verifies. Each era solved the previous one's worst failure.Phase 05: NLP — Foundations to Advanced / ~75 minutes lessonMultilingual NLPOne model, 100+ languages, zero training data for most of them. Cross-lingual transfer is the practical miracle of the 2020s.Phase 05: NLP — Foundations to Advanced / ~45 minutes lessonSubword Tokenization — BPE, WordPiece, Unigram, SentencePieceWord tokenizers choke on unseen words. Character tokenizers blow up sequence length. Subword tokenizers split the difference. Every modern LLM ships on one.Phase 05: NLP — Foundations to Advanced / ~60 minutes lessonStructured Outputs & Constrained DecodingAsk an LLM for JSON. Get JSON most of the time. In production, "most" is the problem. Constrained decoding turns "most" into "always" by editing the logits before sampling.Phase 05: NLP — Foundations to Advanced / ~60 minutes lessonNatural Language Inference — Textual Entailment"t entails h" means a human reading t would conclude h is true. NLI is the task of predicting entailment / contradiction / neutral. Boring on the surface, load-bearing in production.Phase 05: NLP — Foundations to Advanced / ~60 minutes lessonEmbedding Models — The 2026 Deep DiveWord2Vec gave you a vector per word. Modern embedding models give you a vector per passage, cross-lingual, with sparse, dense, and multi-vector views, sized to fit your index. Pick wrong and your RAG retrieves the wrong thing.Phase 05: NLP — Foundations to Advanced / ~60 minutes lessonChunking Strategies for RAGChunking configuration influences retrieval quality as much as the choice of embedding model (Vectara NAACL 2025). Get chunking wrong and no amount of reranking saves you.Phase 05: NLP — Foundations to Advanced / ~60 minutes lessonCoreference Resolution"She called him. He did not answer. The doctor was at lunch." Three references to two people and nobody is named. Coreference resolution figures out who is who.Phase 05: NLP — Foundations to Advanced / ~60 minutes lessonEntity Linking & DisambiguationNER found "Paris." Entity linking decides: Paris, France? Paris Hilton? Paris, Texas? Paris (the Trojan prince)? Without linking, your knowledge graph stays ambiguous.Phase 05: NLP — Foundations to Advanced / ~60 minutes lessonRelation Extraction & Knowledge Graph ConstructionNER found the entities. Entity linking anchored them. Relation extraction finds the edges between them. A knowledge graph is the sum of nodes, edges, and their provenance.Phase 05: NLP — Foundations to Advanced / ~60 minutes lessonLLM Evaluation — RAGAS, DeepEval, G-EvalExact-match and F1 miss semantic equivalence. Human review does not scale. LLM-as-judge is the production answer — with enough calibration to trust the number.Phase 05: NLP — Foundations to Advanced / ~75 minutes lessonLong-Context Evaluation — NIAH, RULER, LongBench, MRCRGemini 3 Pro advertises 10M tokens of context. At 1M tokens, 8-needle MRCR drops to 26.3%. Advertised ≠ usable. Long-context evaluation tells you the actual capacity of the model you are shipping on.Phase 05: NLP — Foundations to Advanced / ~60 minutes lessonDialogue State Tracking"I want a cheap restaurant in the north... actually make it moderate... and add Italian." Three turns, three state updates. DST keeps the slot-value dict in sync so the booking works.Phase 05: NLP — Foundations to Advanced / ~75 minutes lessonAudio Fundamentals — Waveforms, Sampling, Fourier TransformWaveforms are the raw signal. Spectrograms are the representation. Mel features are the ML-friendly form. Every modern ASR and TTS pipeline walks this ladder, and the first rung is understanding sampling and Fourier.Phase 06: Speech & Audio / ~45 minutes lessonSpectrograms, Mel Scale & Audio FeaturesNeural nets do not consume raw waveforms well. They consume spectrograms. They consume mel spectrograms even better. Every ASR, TTS, and audio classifier in 2026 lives or dies by this single preprocessing choice.Phase 06: Speech & Audio / ~45 minutes lessonAudio Classification — From k-NN on MFCCs to AST and BEATsEverything from "dog barking vs siren" to "which language is this" is audio classification. The features are mels. The architecture moves each decade. The evaluation stays AUC, F1, and per-class recall.Phase 06: Speech & Audio / ~75 minutes lessonSpeech Recognition (ASR) — CTC, RNN-T, AttentionSpeech recognition is audio classification at every timestep, glued together by a sequence model that knows English and silence. CTC, RNN-T, and attention are the three ways to do it. Pick one and understand why.Phase 06: Speech & Audio / ~45 minutes lessonWhisper — Architecture & Fine-TuningWhisper is a 30-second-window transformer encoder-decoder, trained on 680k hours of multilingual weakly-supervised audio-text pairs. One architecture, multiple tasks, robust across 99 languages. The 2026 reference ASR.Phase 06: Speech & Audio / ~75 minutes lessonSpeaker Recognition & VerificationASR asks "what did they say?" Speaker recognition asks "who said it?" The math looks the same — embeddings plus cosine — but every production decision hinges on a single EER number.Phase 06: Speech & Audio / ~45 minutes lessonText-to-Speech (TTS) — From Tacotron to F5 and KokoroASR inverts speech to text; TTS inverts text to speech. The 2026 stack is three parts: text → tokens, tokens → mel, mel → waveform. Each part has a default model that fits in a laptop.Phase 06: Speech & Audio / ~75 minutes lessonVoice Cloning & Voice ConversionVoice cloning reads your text in someone else's voice. Voice conversion rewrites your voice into someone else's while preserving what you said. Both hang on the same decomposition: separate speaker identity from content.Phase 06: Speech & Audio / ~75 minutes lessonMusic Generation — MusicGen, Stable Audio, Suno, and the Licensing Earthquake2026 music generation: Suno v5 and Udio v4 dominate commercial; MusicGen, Stable Audio Open, and ACE-Step lead open-source. The technical problem is mostly solved. The legal problem (Warner Music $500M settlement, UMG settlement) reshaped...Phase 06: Speech & Audio / ~75 minutes lessonAudio-Language Models — Qwen2.5-Omni, Audio Flamingo, GPT-4o Audio2026 audio-language models reason over speech + environmental sound + music. Qwen2.5-Omni-7B matches GPT-4o Audio on MMAU-Pro. Audio Flamingo Next beats Gemini 2.5 Pro on LongAudioBench. The gap between open and closed is essentially close...Phase 06: Speech & Audio / ~45 minutes lessonReal-Time Audio ProcessingBatch pipelines process a file. Real-time pipelines process the next 20 milliseconds before the next 20 arrive. Every conversational AI, broadcast studio, and telephony bot lives and dies by this latency budget.Phase 06: Speech & Audio / ~75 minutes lessonBuild a Voice Assistant Pipeline — The Phase 6 CapstoneEverything from lessons 01-11, stitched together. Build a voice assistant that listens, reasons, and talks back. In 2026 that is a solved engineering problem, not a research problem — but the integration details decide whether it ships.Phase 06: Speech & Audio / ~120 minutes lessonNeural Audio Codecs — EnCodec, SNAC, Mimi, DAC and the Semantic-Acoustic Split2026 audio generation is almost all tokens. EnCodec, SNAC, Mimi, and DAC turn continuous waveforms into discrete sequences that a transformer can predict. The semantic-vs-acoustic token split — first-codebook as semantic, rest as acoustic...Phase 06: Speech & Audio / ~60 minutes lessonVoice Activity Detection & Turn-Taking — Silero, Cobra, and the Flush TrickEvery voice agent lives or dies on two decisions: is the user speaking now, and are they done? VAD answers the first. Turn-detection (VAD + silence-hangover + semantic endpoint model) answers the second. Get either wrong and your assistant...Phase 06: Speech & Audio / ~45 minutes lessonStreaming Speech-to-Speech — Moshi, Hibiki, and Full-Duplex Dialogue2024-2026 redefined voice AI. Moshi ships a single model that listens and speaks simultaneously at 200 ms latency. Hibiki does speech-to-speech translation chunk-by-chunk. Both abandon the ASR → LLM → TTS pipeline for a unified full-duplex...Phase 06: Speech & Audio / ~75 minutes lessonVoice Anti-Spoofing & Audio Watermarking — ASVspoof 5, AudioSeal, WaveVerifyVoice cloning shipped faster than defenses. 2026 production voice systems need two things: a detector (AASIST, RawNet2) that classifies real vs fake speech, and a watermark (AudioSeal) that survives compression and editing. Ship both or do...Phase 06: Speech & Audio / ~75 minutes lessonAudio Evaluation — WER, MOS, UTMOS, MMAU, FAD, and the Open LeaderboardsYou cannot ship what you cannot measure. This lesson names the 2026 metrics for every audio task: ASR (WER, CER, RTFx), TTS (MOS, UTMOS, SECS, WER-on-ASR-round-trip), audio-language (MMAU, LongAudioBench), music (FAD, CLAP), and speaker (E...Phase 06: Speech & Audio / ~60 minutes lessonWhy Transformers — The Problems with RNNsRNNs process tokens one at a time. Transformers process all tokens at once. That single architectural bet changed every scaling curve in deep learning after 2017.Phase 07: Transformers Deep Dive / ~45 minutes lessonSelf-Attention from ScratchAttention is a lookup table where every word asks "who matters to me?" - and learns the answer.Phase 07: Transformers Deep Dive / ~90 minutes lessonMulti-Head AttentionOne attention head learns one relation at a time. Eight heads learn eight. Heads are free. Take more of them.Phase 07: Transformers Deep Dive / ~75 minutes lessonPositional Encoding — Sinusoidal, RoPE, ALiBiAttention is permutation-invariant. "The cat sat on the mat" and "mat the on sat cat the" produce the same output without positional signal. Three algorithms fix it — each with a different bet on what "position" means.Phase 07: Transformers Deep Dive / ~45 minutes lessonThe Full Transformer — Encoder + DecoderAttention is the star. Everything else — residuals, normalization, feed-forward, cross-attention — is the scaffolding that lets you stack it deep.Phase 07: Transformers Deep Dive / ~75 minutes lessonBERT — Masked Language ModelingGPT predicts the next word. BERT predicts a missing word. One sentence of difference — and half a decade of everything embedding-shaped.Phase 07: Transformers Deep Dive / ~45 minutes lessonGPT — Causal Language ModelingBERT sees both sides. GPT sees only the past. The triangle mask is the most consequential single line of code in modern AI.Phase 07: Transformers Deep Dive / ~75 minutes lessonT5, BART — Encoder-Decoder ModelsEncoders understand. Decoders generate. Put them back together and you get a model built for input → output tasks: translate, summarize, rewrite, transcribe.Phase 07: Transformers Deep Dive / ~45 minutes lessonVision Transformers (ViT)An image is a grid of patches. A sentence is a grid of tokens. The same transformer eats both.Phase 07: Transformers Deep Dive / ~45 minutes lessonAudio Transformers — Whisper ArchitectureAudio is an image of frequency over time. Whisper is a ViT that eats mel spectrograms and speaks back.Phase 07: Transformers Deep Dive / ~45 minutes lessonMixture of Experts (MoE)A dense 70B transformer activates every parameter for every token. A 671B MoE activates only 37B per token and beats it on every benchmark. Sparsity is the most important scaling idea of the decade.Phase 07: Transformers Deep Dive / ~45 minutes lessonKV Cache, Flash Attention & Inference OptimizationTraining is parallel and FLOP-bound. Inference is serial and memory-bound. Different bottleneck, different tricks.Phase 07: Transformers Deep Dive / ~75 minutes lessonScaling LawsThe 2020 Kaplan paper said: bigger model, lower loss. The 2022 Hoffmann paper said: you were under-training. Compute goes into two buckets — parameters and tokens — and the split is not obvious.Phase 07: Transformers Deep Dive / ~45 minutes lessonBuild a Transformer from Scratch — The CapstoneThirteen lessons. One model. No shortcuts.Phase 07: Transformers Deep Dive / ~120 minutes lessonAttention Variants — Sliding Window, Sparse, DifferentialFull attention is a circle. Every token sees every token, and memory pays the price. Four variants bend the shape of the circle and recover half the cost.Phase 07: Transformers Deep Dive / ~60 minutes lessonSpeculative Decoding — Draft, Verify, RepeatAutoregressive decoding is serial. Each token waits for the previous one. Speculative decoding breaks the chain: a cheap model drafts N tokens, the expensive model verifies all N in one forward pass. When the draft is right you paid one bi...Phase 07: Transformers Deep Dive / ~60 minutes lessonGenerative Models — Taxonomy & HistoryEvery image model, text model, video model, and 3D model fits in one of five buckets. Pick the wrong bucket and you will fight the math for weeks. Pick the right one and the field's last twelve years of progress stacks cleanly in your head.Phase 08: Generative AI / ~45 minutes lessonAutoencoders & Variational Autoencoders (VAE)A plain autoencoder compresses then reconstructs. It memorizes. It does not generate. Add one trick — force the code to look Gaussian — and you get a sampler. That single trick, the reparameterization of z = μ + σ·ε, is why every latent-di...Phase 08: Generative AI / ~75 minutes lessonGANs — Generator vs DiscriminatorGoodfellow's trick in 2014 was to skip density entirely. Two networks. One makes fakes. One catches them. They fight until the fakes are indistinguishable from real. It shouldn't work. It often doesn't. When it does, the samples are still...Phase 08: Generative AI / ~75 minutes lessonConditional GANs & Pix2PixThe first big unlock of 2014-2017 was controlling what a GAN makes. Attach a label, or an image, or a sentence. Pix2Pix did the image version and it still beats every generic text-to-image model on narrow image-to-image tasks.Phase 08: Generative AI / ~75 minutes lessonStyleGANMost generators stir z into every layer at the same time. StyleGAN split it apart: first map z to an intermediate w, then inject w at every resolution level through AdaIN. That single change untangled the latent space and made photorealist...Phase 08: Generative AI / ~45 minutes lessonDiffusion Models — DDPM from ScratchHo, Jain, Abbeel (2020) gave the field a recipe it could not quit. Destroy the data with noise over a thousand small steps. Train one neural net to predict the noise. Reverse the process at inference. Today every mainstream image, video, 3...Phase 08: Generative AI / ~75 minutes lessonLatent Diffusion & Stable DiffusionPixel-space diffusion on 512×512 images is a computational war crime. Rombach et al. (2022) noticed that you do not need all 786k dimensions to generate an image — you need enough to capture semantic structure, and a separate decoder for t...Phase 08: Generative AI / ~75 minutes lessonControlNet, LoRA & ConditioningText alone is a clumsy control signal. ControlNet lets you clone a pretrained diffusion model and steer it with a depth map, pose skeleton, scribble, or edge image. LoRA lets you fine-tune a 2B-parameter model by training 10 million parame...Phase 08: Generative AI / ~75 minutes lessonInpainting, Outpainting & Image EditingText-to-image makes new things. Inpainting fixes old ones. In production, 70% of billable image work is editing — swap a background, remove a logo, extend the canvas, regenerate a hand. Inpainting is where diffusion earns its keep.Phase 08: Generative AI / ~75 minutes lessonVideo GenerationAn image is a 2-D tensor. A video is a 3-D one. The theory is the same; the compute is 10-100x harder. OpenAI's Sora (Feb 2024) proved it was possible. By 2026 Veo 2, Kling 1.5, Runway Gen-3, Pika 2.0, and WAN 2.2 ship production video fro...Phase 08: Generative AI / ~45 minutes lessonAudio GenerationAudio is a 1-D signal at 16-48 kHz. A five-second clip is 80-240k samples. No transformer attends to that sequence directly. The solution for every production audio model in 2026 is the same: a neural codec (Encodec, SoundStream, DAC) comp...Phase 08: Generative AI / ~45 minutes lesson3D Generation3D is the modality where 2D-to-3D leverage is strongest. The 2023 breakthrough was 3D Gaussian Splatting. The 2024-2026 generative push layers multi-view diffusion + 3D reconstruction on top to produce objects and scenes from a single prom...Phase 08: Generative AI / ~45 minutes lessonFlow Matching & Rectified FlowsDiffusion models take 20-50 sampling steps because they walk a curved path from noise to data. Flow matching (Lipman et al., 2023) and rectified flow (Liu et al., 2022) trained straight paths. Straighter paths mean fewer steps mean faster...Phase 08: Generative AI / ~45 minutes lessonEvaluation — FID, CLIP Score, Human PreferenceEvery generative model leaderboard cites FID, CLIP score, and a win rate from a human-preference arena. Each number has a failure mode a determined researcher can game. If you do not know the failure modes, you cannot tell a real improveme...Phase 08: Generative AI / ~45 minutes lessonVisual Autoregressive Modeling (VAR): Next-Scale PredictionDiffusion models sample iteratively in time (denoising steps). VAR samples iteratively in scale — it predicts a 1x1 token, then 2x2, then 4x4, up to the final resolution, each scale conditioning on the previous. The 2024 paper showed VAR m...Phase 08: Generative AI / ~90 minutes lessonMDPs, States, Actions & RewardsA Markov Decision Process is five things: states, actions, transitions, rewards, a discount. Everything in RL — Q-learning, PPO, DPO, GRPO — optimizes over this shape. Learn it once, read the rest of reinforcement learning for free.Phase 09: Reinforcement Learning / ~45 minutes lessonDynamic Programming — Policy Iteration & Value IterationDynamic programming is RL with cheating. You already know the transition and reward functions; you just iterate the Bellman equation until V or π stops moving. It is the benchmark every sampling-based method tries to approach.Phase 09: Reinforcement Learning / ~75 minutes lessonMonte Carlo Methods — Learning from Complete EpisodesDynamic programming needs a model. Monte Carlo needs nothing but episodes. Run the policy, watch the returns, average them. The simplest idea in RL — and the one that unlocks everything downstream.Phase 09: Reinforcement Learning / ~75 minutes lessonTemporal Difference — Q-Learning & SARSAMonte Carlo waits until the episode ends. TD updates after every step by bootstrapping the next value estimate. Q-learning is off-policy and optimistic; SARSA is on-policy and cautious. Both are one line of code. Both underpin every deep-R...Phase 09: Reinforcement Learning / ~75 minutes lessonDeep Q-Networks (DQN)2013: Mnih trained one Q-learning network on raw pixels, beat every classical RL agent on seven Atari games. 2015: extended to 49 games, published in Nature, sparked the deep-RL era. DQN is Q-learning plus three tricks that make function a...Phase 09: Reinforcement Learning / ~75 minutes lessonPolicy Gradient — REINFORCE from ScratchStop estimating value. Parameterize the policy directly, compute the gradient of expected return, step uphill. Williams (1992) wrote it in one theorem. It is why PPO, GRPO, and every LLM RL loop exist.Phase 09: Reinforcement Learning / ~75 minutes lessonActor-Critic — A2C and A3CREINFORCE is noisy. Add a critic that learns V̂(s), subtract it from the return, and you get an advantage that has the same expectation but far lower variance. That is actor-critic. A2C runs it synchronously; A3C runs it across threads. Bo...Phase 09: Reinforcement Learning / ~75 minutes lessonProximal Policy Optimization (PPO)A2C throws away each rollout after one update. PPO wraps the policy gradient in a clipped importance ratio so you can do 10+ epochs on the same data without the policy exploding. Schulman et al. (2017). Still the default policy-gradient al...Phase 09: Reinforcement Learning / ~75 minutes lessonReward Modeling & RLHFHumans cannot write a reward function for "good assistant response," but they can compare two responses and pick the better one. Fit a reward model to those comparisons, then RL the language model against it. Christiano 2017. InstructGPT 2...Phase 09: Reinforcement Learning / ~45 minutes lessonMulti-Agent RLSingle-agent RL assumes the environment is stationary. Put two learning agents in the same world and that assumption breaks: each agent is part of the other's environment, and both are changing. Multi-agent RL is the set of tricks to make...Phase 09: Reinforcement Learning / ~45 minutes lessonSim-to-Real TransferA policy trained in a simulator that fails on hardware is a policy that memorized the simulator. Domain randomization, domain adaptation, and system identification are the three tools to make learned controllers cross the reality gap.Phase 09: Reinforcement Learning / ~45 minutes lessonRL for Games — AlphaZero, MuZero, and the LLM-Reasoning Era1992: TD-Gammon beat human champions at backgammon with pure TD. 2016: AlphaGo beat Lee Sedol. 2017: AlphaZero dominated chess, shogi, and Go from scratch. 2024: DeepSeek-R1 proved the same recipe, with GRPO replacing PPO, works on reasoni...Phase 09: Reinforcement Learning / ~120 minutes lessonTokenizers: BPE, WordPiece, SentencePieceYour LLM does not read English. It reads integers. The tokenizer decides whether those integers carry meaning or waste it.Phase 10: LLMs from Scratch / ~90 minutes lessonBuilding a Tokenizer from ScratchLesson 01 gave you a toy. This lesson gives you a weapon.Phase 10: LLMs from Scratch / ~90 minutes lessonData Pipelines for Pre-TrainingThe model is a mirror. It reflects whatever data you feed it. Feed it garbage, it reflects garbage with perfect fluency.Phase 10: LLMs from Scratch / ~90 minutes lessonPre-Training a Mini GPT (124M Parameters)GPT-2 Small has 124 million parameters. That's 12 transformer layers, 12 attention heads, and 768-dimensional embeddings. You can train it from scratch on a single GPU in a few hours. Most people never do this. They use pre-trained checkpo...Phase 10: LLMs from Scratch / ~120 minutes lessonScaling: Distributed Training, FSDP, DeepSpeedYour 124M model trained on one GPU. Now try 7 billion parameters. The model doesn't fit in memory. The data takes weeks on a single machine. Distributed training isn't optional at scale. It's the only path forward.Phase 10: LLMs from Scratch / ~120 minutes lessonInstruction Tuning (SFT)A base model predicts the next token. That's it. It doesn't follow instructions, answer questions, or refuse harmful requests. SFT is the bridge between a token predictor and a useful assistant. Every model you've ever talked to -- Claude,...Phase 10: LLMs from Scratch / ~90 minutes lessonRLHF: Reward Model + PPOSFT teaches the model to follow instructions. But it doesn't teach the model which response is BETTER. Two grammatically correct, factually accurate answers can differ enormously in helpfulness. RLHF is how you encode human judgment into t...Phase 10: LLMs from Scratch / ~90 minutes lessonDPO: Direct Preference OptimizationRLHF works. It also requires training three models (SFT, reward model, policy), managing PPO's instability, and tuning a KL penalty. DPO asks: what if you could skip all of that? DPO directly optimizes the language model on preference pair...Phase 10: LLMs from Scratch / ~90 minutes lessonConstitutional AI and Self-ImprovementRLHF needs humans in the loop. Constitutional AI replaces most of them with the model itself. Write a list of principles, have the model critique its own outputs against those principles, and train on the critiques. DeepSeek-R1 pushed this...Phase 10: LLMs from Scratch / ~45 minutes lessonEvaluation: Benchmarks, Evals, LM HarnessGoodhart's Law: when a measure becomes a target, it ceases to be a good measure. Every frontier lab games benchmarks. MMLU scores go up while models still can't reliably count the number of R's in "strawberry." The only eval that matters i...Phase 10: LLMs from Scratch / ~90 minutes lessonQuantization: Making Models FitA 70B model in FP16 needs 140GB. Two A100s just for weights. Quantize to FP8: one 80GB GPU. INT4: a MacBook.Phase 10: LLMs from Scratch / ~120 minutes lessonInference OptimizationTwo phases define LLM inference. Prefill processes your prompt in parallel -- compute-bound. Decode generates tokens one at a time -- memory-bound. Every optimization targets one or both.Phase 10: LLMs from Scratch / ~120 minutes lessonBuilding a Complete LLM PipelineEverything from Lessons 01 to 12 is one stage of one pipeline. This lesson is the scaffold that turns those stages into a single end-to-end run: tokenize, pre-train, scale, SFT, align, evaluate, quantize, serve. You will not train a 70B mo...Phase 10: LLMs from Scratch / ~120 minutes lessonOpen Models: Architecture WalkthroughsYou built a GPT-2 Small from scratch in Lesson 04. Frontier open models in 2026 are the same family with five or six concrete changes. RMSNorm instead of LayerNorm. SwiGLU instead of GELU. RoPE instead of learned positions. GQA or MLA inst...Phase 10: LLMs from Scratch / ~45 minutes lessonSpeculative Decoding and EAGLE-3Phase 7 · Lesson 16 proved the math: the Leviathan rejection rule preserves the verifier's distribution exactly. This lesson is the training-stack view of 2026 production speculative decoding. EAGLE-3 turned the draft model from a cheap ap...Phase 10: LLMs from Scratch / ~75 minutes lessonDifferential Attention (V2)Softmax attention spreads a small amount of probability over every non-matching token. Over 100k tokens that noise adds up and drowns the signal. Differential Transformer (Ye et al., ICLR 2025) fixes it by computing attention as the differ...Phase 10: LLMs from Scratch / ~60 minutes lessonNative Sparse Attention (DeepSeek NSA)At 64k tokens, attention eats 70-80% of decode latency. Every open-model lab has a plan to fix it. DeepSeek's NSA (ACL 2025 best paper) is the one that stuck: three parallel attention branches — compressed coarse-grained tokens, selectivel...Phase 10: LLMs from Scratch / ~60 minutes lessonMulti-Token Prediction (MTP)Every autoregressive LLM from GPT-2 to Llama 3 trains on one loss per position: predict the next token. DeepSeek-V3 added a second loss per position: predict the token after that. The extra 14B of parameters (on a 671B model) got distilled...Phase 10: LLMs from Scratch / ~60 minutes lessonDualPipe ParallelismDeepSeek-V3 was trained on 2,048 H800 GPUs with MoE experts scattered across nodes. Cross-node expert all-to-all communication cost 1 GPU-hour of comm for every 1 GPU-hour of compute. GPUs were idle half the time. DualPipe (DeepSeek, Dec 2...Phase 10: LLMs from Scratch / ~60 minutes lessonDeepSeek-V3 Architecture WalkthroughPhase 10 · Lesson 14 named the six architectural knobs every open model turns. DeepSeek-V3 (December 2024, 671B parameters total, 37B active) turns all six and adds four more: Multi-Head Latent Attention, auxiliary-loss-free load balancing...Phase 10: LLMs from Scratch / ~75 minutes lessonJamba — Hybrid SSM-TransformerState space models (SSMs) and transformers want different things. Transformers buy quality via attention at quadratic cost. SSMs buy linear-time inference and constant memory via a recurrence but lag quality. AI21's Jamba (March 2024) and...Phase 10: LLMs from Scratch / ~60 minutes lessonAsync and Hogwild! InferenceSpeculative decoding (Phase 10 · 15) parallelizes tokens within one sequence. Multi-agent frameworks parallelize across whole sequences but force explicit coordination (voting, sub-task splitting). Hogwild! Inference (Rodionov et al., arXi...Phase 10: LLMs from Scratch / ~60 minutes lessonSpeculative Decoding and EAGLEA frontier LLM generating one token requires a full forward pass over billions of parameters. That forward pass is massively over-provisioned: most of the time a much smaller model can guess the next 3-5 tokens correctly, and the big model...Phase 10: LLMs from Scratch / ~75 minutes lessonGradient Checkpointing and Activation RecomputationBackprop keeps every intermediate activation. At 70B parameters and 128K context that is 3 TB of activations per rank. Checkpointing trades FLOPs for memory: recompute instead of save. The question is which segments to drop, and the answer...Phase 10: LLMs from Scratch / ~70 minutes lessonPrompt Engineering: Techniques & PatternsMost people write prompts like they are texting a friend. Then they wonder why a 200-billion parameter model gives mediocre answers. Prompt engineering is not about tricks. It is about understanding that every token you send is an instruct...Phase 11: LLM Engineering / ~90 minutes lessonFew-Shot, Chain-of-Thought, Tree-of-ThoughtTelling a model what to do is prompting. Showing it how to think is engineering. The gap between 78% and 91% accuracy on the same model, same task, same data is not a better model. It is a better reasoning strategy.Phase 11: LLM Engineering / ~45 minutes lessonStructured Outputs: JSON, Schema Validation, Constrained DecodingYour LLM returns a string. Your application needs JSON. That gap has crashed more production systems than any model hallucination. Structured output is the bridge between natural language and typed data. Get it right and your LLM becomes a...Phase 11: LLM Engineering / ~90 minutes lessonEmbeddings & Vector RepresentationsText is discrete. Math is continuous. Every time you ask an LLM to find "similar" documents, compare meanings, or search beyond keywords, you're relying on a bridge between these two worlds. That bridge is an embedding. If you don't unders...Phase 11: LLM Engineering / ~75 minutes lessonContext Engineering: Windows, Budgets, Memory, and RetrievalPrompt engineering is a subset. Context engineering is the whole game. A prompt is a string you type. Context is everything that goes into the model's window: system instructions, retrieved documents, tool definitions, conversation history...Phase 11: LLM Engineering / ~90 minutes lessonRAG (Retrieval-Augmented Generation)Your LLM knows everything up to its training cutoff. It knows nothing about your company's docs, your codebase, or last week's meeting notes. RAG solves this by retrieving relevant documents and stuffing them into the prompt. It's the most...Phase 11: LLM Engineering / ~90 minutes lessonAdvanced RAG (Chunking, Reranking, Hybrid Search)Basic RAG retrieves the top-k most similar chunks. That works for simple questions. It falls apart for multi-hop reasoning, ambiguous queries, and large corpora. Advanced RAG is the difference between a demo that works on 10 documents and...Phase 11: LLM Engineering / ~90 minutes lessonFine-Tuning with LoRA & QLoRAFull fine-tuning a 7B model requires 56GB of VRAM. You don't have that. Neither do most companies. LoRA lets you fine-tune the same model in 6GB by training less than 1% of the parameters. This isn't a compromise -- it matches full fine-tu...Phase 11: LLM Engineering / ~75 minutes lessonFunction Calling & Tool UseLLMs cannot do anything. They generate text. That is the entire capability. They cannot check the weather, query a database, send an email, run code, or read a file. Every "AI agent" you have ever seen is an LLM generating JSON that says w...Phase 11: LLM Engineering / ~75 minutes lessonEvaluation & Testing LLM ApplicationsYou would never deploy a web app without tests. You would never ship a database migration without a rollback plan. But right now, most teams ship LLM applications by reading 10 outputs and saying "yeah, looks good." That is not evaluation....Phase 11: LLM Engineering / ~45 minutes lessonCaching, Rate Limiting & Cost OptimizationMost AI startups do not die from bad models. They die from bad unit economics. A single GPT-4o call costs fractions of a cent. Ten thousand users making ten calls per day costs $250 in input tokens alone -- before you charge a single dolla...Phase 11: LLM Engineering / ~45 minutes lessonGuardrails, Safety & Content FilteringYour LLM application will be attacked. Not might. Will. The first prompt injection attempt against your production system will come within 48 hours of launch. The question is not whether someone will try "ignore previous instructions and r...Phase 11: LLM Engineering / ~45 minutes lessonBuilding a Production LLM ApplicationYou have built prompts, embeddings, RAG pipelines, function calling, caching layers, and guardrails. Separately. In isolation. Like practicing guitar scales without ever playing a song. This lesson is the song. You will wire every componen...Phase 11: LLM Engineering / ~120 minutes lessonModel Context Protocol (MCP)Every LLM app built before 2025 invented its own tool schema. Then Anthropic shipped MCP, Claude adopted it, OpenAI adopted it, and by 2026 it is the default wire format for connecting any LLM to any tool, data source, or agent. Write one...Phase 11: LLM Engineering / ~75 minutes lessonPrompt Caching and Context CachingYour system prompt is 4,000 tokens. Your RAG context is 20,000 tokens. You send both with every request. You also pay for both — every time. Prompt caching lets the provider keep that prefix warm on their side and bill you 10% of the norma...Phase 11: LLM Engineering / ~60 minutes lessonLangGraph — State Machines for AgentsA ReAct loop written by hand is a while True. A ReAct loop written in LangGraph is a graph you can checkpoint, interrupt, branch, and time-travel through. The agent hasn't changed. The harness around it has.Phase 11: LLM Engineering / ~75 minutes lessonAgent Framework Tradeoffs — LangGraph vs CrewAI vs AutoGen vs AgnoEvery framework sells the same demo (research agent builds a report) and hides the same bug (state schema fights with the orchestration layer). Pick the framework whose abstractions match the shape of your problem; everything else is glue...Phase 11: LLM Engineering / ~45 minutes lessonVision Transformers and the Patch-Token PrimitiveBefore anything multimodal, an image has to become a sequence of tokens a transformer can eat. The 2020 ViT paper answered this with 16x16 pixel patches, a linear projection, and a position embedding. Five years later every 2026 frontier m...Phase 12: Multimodal AI / ~120 minutes lessonCLIP and Contrastive Vision-Language PretrainingOpenAI's CLIP (2021) proved a single idea big enough to power the next five years: align an image encoder and a text encoder in the same vector space using only noisy web image-caption pairs and a contrastive loss. Zero supervised labels....Phase 12: Multimodal AI / ~180 minutes lessonFrom CLIP to BLIP-2 — Q-Former as Modality BridgeCLIP aligns image and text but cannot generate captions, answer questions, or hold a conversation. BLIP-2 (Salesforce, 2023) solved that with a small trainable bridge: 32 learnable query vectors attend over a frozen ViT's features via cros...Phase 12: Multimodal AI / ~180 minutes lessonFlamingo and Gated Cross-Attention for Few-Shot VLMsDeepMind's Flamingo (2022) did two things before anyone else. It showed a single model could process arbitrarily interleaved sequences of images, videos, and text. And it showed VLMs could learn in-context — give a few-shot prompt with thr...Phase 12: Multimodal AI / ~120 minutes lessonLLaVA and Visual Instruction TuningLLaVA (April 2023) is the most copied multimodal architecture on the planet. It replaced BLIP-2's Q-Former with a 2-layer MLP, replaced Flamingo's gated cross-attention with naive token concatenation, and trained on 158k visual-instruction...Phase 12: Multimodal AI / ~180 minutes lessonAny-Resolution Vision: Patch-n'-Pack and NaFlexReal images are not 224x224 squares. A receipt is 9:16, a chart is 16:9, a medical scan might be 4096x4096, a mobile screenshot is 9:19.5. The pre-2024 VLM answer — resize everything to a fixed square — threw away the signal that makes OCR...Phase 12: Multimodal AI / ~120 minutes lessonOpen-Weight VLM Recipes: What Actually MattersThe 2024-2026 open-weight VLM literature is a forest of ablation tables. Apple's MM1 tested 13 combinations of image encoder, connector, and data mix. Allen AI's Molmo proved detailed human captions beat GPT-4V distillation. Cambrian-1 ran...Phase 12: Multimodal AI / ~180 minutes lessonLLaVA-OneVision: Single-Image, Multi-Image, Video in One ModelBefore LLaVA-OneVision (Li et al., August 2024) the open-VLM world had separate lineages: LLaVA-1.5 for single images, multi-image models like Mantis and VILA, video models like Video-LLaVA and Video-LLaMA. Each won its benchmark and faile...Phase 12: Multimodal AI / ~180 minutes lessonQwen-VL Family and Dynamic-FPS VideoThe Qwen-VL family — Qwen-VL (2023), Qwen2-VL (2024), Qwen2.5-VL (2025), Qwen3-VL (2025) — is the most influential open vision-language model lineage in 2026. Each generation made a single decisive architectural bet that the rest of the op...Phase 12: Multimodal AI / ~120 minutes lessonInternVL3: Native Multimodal PretrainingEvery open VLM before InternVL3 followed the same three-step recipe: take a text LLM trained on trillions of text tokens, bolt on a vision encoder, then fine-tune the seams. This works but has alignment debt — the text LLM has spent its fu...Phase 12: Multimodal AI / ~120 minutes lessonChameleon and Early-Fusion Token-Only Multimodal ModelsEvery VLM we have seen so far keeps images and text separate. Visual tokens come from a vision encoder, flow into a projector, then meet text inside the LLM. The vision and text vocabularies never overlap. Chameleon (Meta, May 2024) asked:...Phase 12: Multimodal AI / ~180 minutes lessonEmu3: Next-Token Prediction for Image and Video GenerationBAAI's Emu3 (Wang et al., September 2024) is the 2024 result that should have ended the diffusion-versus-autoregressive debate. A single Llama-style decoder-only transformer, trained only on the next-token-prediction objective, across a un...Phase 12: Multimodal AI / ~120 minutes lessonTransfusion: Autoregressive Text + Diffusion Image in One TransformerChameleon and Emu3 bet everything on discrete tokens. They work, but the quantization bottleneck is visible — the image quality plateaus below continuous-space diffusion models. Transfusion (Meta, Zhou et al., August 2024) takes the opposi...Phase 12: Multimodal AI / ~180 minutes lessonShow-o and Discrete-Diffusion Unified ModelsTransfusion mixes continuous and discrete representations. Show-o (Xie et al., August 2024) goes the other way: text tokens use causal next-token prediction, image tokens use masked discrete diffusion in the spirit of MaskGIT. Both sit ins...Phase 12: Multimodal AI / ~120 minutes lessonJanus-Pro: Decoupled Encoders for Unified Multimodal ModelsUnified multimodal models have an unavoidable tension. Understanding wants semantic features — SigLIP or DINOv2 output vectors rich with concept-level information. Generation wants reconstruction-friendly codes — VQ tokens that compose bac...Phase 12: Multimodal AI / ~120 minutes lessonMIO and Any-to-Any Streaming Multimodal ModelsGPT-4o ships a product most open models cannot replicate: an agent that hears voice, sees video, and speaks back in real time. The open-ecosystem answer by late 2024 was MIO (Wang et al., September 2024). MIO tokenizes text, image, speech,...Phase 12: Multimodal AI / ~120 minutes lessonVideo-Language Models: Temporal Tokens and GroundingVideo is not a stack of photos. A 5-second clip has causal ordering, action verbs, and event timing that an image model cannot represent. Video-LLaMA (Zhang et al., June 2023) shipped the first open video-LLM with audio-visual grounding. V...Phase 12: Multimodal AI / ~180 minutes lessonLong-Video Understanding at Million-Token ContextA 1-hour 4K video at 24 FPS, patched and embedded, produces on the order of 60 million tokens. A 2-hour podcast episode transcribed is 30,000 tokens. A full Blu-ray feature film, even compressed with aggressive pooling, is hundreds of thou...Phase 12: Multimodal AI / ~180 minutes lessonAudio-Language Models: the Whisper to Audio Flamingo 3 ArcWhisper (Radford et al., December 2022) settled speech recognition — 680k hours of weakly-supervised multilingual speech, a simple encoder-decoder transformer, a benchmark that made every subsequent ASR release cite it. But recognition is...Phase 12: Multimodal AI / ~180 minutes lessonOmni Models: Qwen2.5-Omni and the Thinker-Talker SplitGPT-4o's product demo in May 2024 was disruptive not because of the underlying model but because of the product shape — a voice interface where you talk, the model sees what the camera sees, and it talks back in under 250ms. The open ecosy...Phase 12: Multimodal AI / ~180 minutes lessonEmbodied VLAs: RT-2, OpenVLA, π0, GR00TThe first time a model read a recipe off a website and executed it in a kitchen robot was RT-2 (Google DeepMind, July 2023). RT-2 discretized actions as text tokens, co-fine-tuned a VLM on web data plus robot-action data, and proved that w...Phase 12: Multimodal AI / ~180 minutes lessonDocument and Diagram UnderstandingDocuments are not photos. A PDF, scientific paper, invoice, or handwritten form has layout, tables, diagrams, footnotes, headers, and semantic structure that plain image understanding cannot capture. The pre-VLM stack was a pipeline: Tesse...Phase 12: Multimodal AI / ~180 minutes lessonColPali and Vision-Native Document RAGTraditional RAG parses PDFs into text, splits into chunks, embeds chunks, stores vectors. Every step loses signal: OCR drops chart data, chunking breaks table rows, text embeddings ignore figures. ColPali (Faysse et al., July 2024) asked t...Phase 12: Multimodal AI / ~180 minutes lessonMultimodal RAG and Cross-Modal RetrievalVision-native document RAG is one slice. Production multimodal RAG goes wider — retrieving across text, images, audio, and video for workflows like trip planning ("find me a quiet vegan brunch with natural light"), medical triage ("what in...Phase 12: Multimodal AI / ~180 minutes lessonMultimodal Agents and Computer-Use (Capstone)The 2026 frontier product is a multimodal agent that reads screenshots, clicks buttons, navigates web UIs, fills forms, and completes workflows end-to-end. SeeClick and CogAgent (2024) proved the GUI-grounding primitive. Ferret-UI added mo...Phase 12: Multimodal AI / ~240 minutes lessonThe Tool Interface — Why Agents Need Structured I/OA language model produces tokens. A program takes actions. The gap between those two is the tool interface: a contract that lets the model request an action and the host execute it. Every 2026 stack — function calling on OpenAI, Anthropic,...Phase 13: Tools & Protocols / ~45 minutes lessonFunction Calling Deep Dive — OpenAI, Anthropic, GeminiThe three frontier providers converged on the same tool-call loop in 2024 and then diverged on everything else. OpenAI uses tools and tool_calls. Anthropic uses tool_use and tool_result blocks. Gemini uses functionDeclarations and unique-i...Phase 13: Tools & Protocols / ~75 minutes lessonParallel Tool Calls and Streaming with ToolsThree independent weather lookups serialized is three round trips. Run them in parallel and total time collapses to the slowest single call. Every frontier provider now emits multiple tool calls in a single turn. The payoff is real; the pl...Phase 13: Tools & Protocols / ~75 minutes lessonStructured Output — JSON Schema, Pydantic, Zod, Constrained Decoding"Ask the model nicely to return JSON" fails 5 to 15 percent of the time, even on frontier models. Structured outputs close that gap with constrained decoding: the model is literally prevented from emitting a token that would violate the sc...Phase 13: Tools & Protocols / ~75 minutes lessonTool Schema Design — Naming, Descriptions, Parameter ConstraintsA correct tool fails silently when the model cannot tell when to use it. Naming, descriptions, and parameter shapes drive 10 to 20 percentage-point swings in tool-selection accuracy on benchmarks like StableToolBench and MCPToolBench++. Th...Phase 13: Tools & Protocols / ~45 minutes lessonMCP Fundamentals — Primitives, Lifecycle, JSON-RPC BaseEvery integration before MCP was a one-off. The Model Context Protocol, first shipped by Anthropic in November 2024 and now stewarded by the Linux Foundation's Agentic AI Foundation, standardizes discovery and invocation so any client can...Phase 13: Tools & Protocols / ~45 minutes lessonBuilding an MCP Server — Python + TypeScript SDKsMost MCP tutorials show only stdio hello-worlds. A real server exposes tools plus resources plus prompts, handles capability negotiation, emits structured errors, and works the same across SDKs. This lesson builds a notes server end-to-end...Phase 13: Tools & Protocols / ~75 minutes lessonBuilding an MCP Client — Discovery, Invocation, Session ManagementMost MCP content ships server tutorials and waves a hand at the client. Client code is where the hard orchestration lives: process spawning, capability negotiation, tool list merging across multiple servers, sampling callbacks, reconnectio...Phase 13: Tools & Protocols / ~75 minutes lessonMCP Transports — stdio vs Streamable HTTP vs SSE Migrationstdio works locally and nowhere else. Streamable HTTP (2025-03-26) is the remote standard. The old HTTP+SSE transport is deprecated and being removed in mid-2026. Picking the wrong transport costs a migration; picking the right one buys a...Phase 13: Tools & Protocols / ~45 minutes lessonMCP Resources and Prompts — Context Exposure Beyond ToolsTools get 90 percent of MCP attention. The other two server primitives solve different problems. Resources expose data for reading; prompts expose reusable templates as slash-commands. Many servers should use resources instead of wrapping...Phase 13: Tools & Protocols / ~45 minutes lessonMCP Sampling — Server-Requested LLM Completions and Agent LoopsMost MCP servers are dumb executors: take arguments, run code, return content. Sampling lets a server flip direction: it asks the client's LLM to make a decision. This enables server-hosted agent loops without the server owning any model c...Phase 13: Tools & Protocols / ~75 minutes lessonRoots and Elicitation — Scoping and Mid-Flight User InputHard-coded paths break the moment a user opens a different project. Pre-filled tool arguments break when the user under-specifies. Roots scope the server to a user-controlled set of URIs; elicitation pauses mid-tool-call to ask the user fo...Phase 13: Tools & Protocols / ~45 minutes lessonAsync Tasks (SEP-1686) — Call-Now, Fetch-Later for Long-Running WorkReal agent work takes minutes to hours: CI runs, deep-research synthesis, batch exports. Synchronous tool calls drop connections, time out, or block the UI. SEP-1686, merged in 2025-11-25, adds a Tasks primitive: any request can be augment...Phase 13: Tools & Protocols / ~75 minutes lessonMCP Apps — Interactive UI Resources via `ui://`Text-only tool output caps what agents can show. MCP Apps (SEP-1724, official January 26, 2026) let a tool return sandboxed interactive HTML rendered inline in Claude Desktop, ChatGPT, Cursor, Goose, and VS Code. Dashboards, forms, maps, 3...Phase 13: Tools & Protocols / ~75 minutes lessonMCP Security I — Tool Poisoning, Rug Pulls, Cross-Server ShadowingTool descriptions land in the model's context verbatim. Malicious servers embed hidden instructions that users never see. Research in 2025-2026 from Invariant Labs, Unit 42, and an arXiv study published March 2026 measured attack-success r...Phase 13: Tools & Protocols / ~45 minutes lessonMCP Security II — OAuth 2.1, Resource Indicators, Incremental ScopesRemote MCP servers need authorization, not just authentication. The 2025-11-25 spec aligns with OAuth 2.1 + PKCE + resource indicators (RFC 8707) + protected-resource metadata (RFC 9728). SEP-835 adds incremental scope consent with step-up...Phase 13: Tools & Protocols / ~75 minutes lessonMCP Gateways and Registries — Enterprise Control PlanesEnterprises cannot let every dev install random MCP servers. A gateway centralizes auth, RBAC, audit, rate limiting, caching, and tool-poisoning detection, then exposes the merged tool surface as a single MCP endpoint. The Official MCP Reg...Phase 13: Tools & Protocols / ~45 minutes lessonMCP Auth in Production — Enrollment, JWKS Refresh, Audience-Pinned TokensLesson 16 stood up the OAuth 2.1 state machine in memory. By 2026, every MCP server you ship to a real org sits behind production auth: client enrollment that scales to an unbounded client population (Client ID Metadata Documents first, dy...Phase 13: Tools & Protocols / ~90 minutes lessonA2A — Agent-to-Agent ProtocolMCP is agent-to-tool. A2A (Agent2Agent) is agent-to-agent — an open protocol for letting opaque agents built on different frameworks collaborate. Released by Google in April 2025, donated to the Linux Foundation in June 2025, reaching v1.0...Phase 13: Tools & Protocols / ~75 minutes lessonOpenTelemetry GenAI — Tracing Tool Calls End-to-EndAn agent calls five tools, three MCP servers, and two sub-agents. You need one trace across all of it. The OpenTelemetry GenAI semantic conventions (stable attributes in v1.37 and up) are the 2026 standard, natively supported by Datadog, L...Phase 13: Tools & Protocols / ~75 minutes lessonLLM Routing Layer — LiteLLM, OpenRouter, PortkeyProvider lock-in is expensive. Different tool-calling workloads suit different models. Routing gateways give one API surface, retries, failover, cost tracking, and guardrails. Three archetypes dominate 2026: LiteLLM (open-source self-hoste...Phase 13: Tools & Protocols / ~45 minutes lessonSkills and Agent SDKs — Anthropic Skills, AGENTS.md, OpenAI Apps SDKMCP says "what tools exist." Skills say "how to do a task." The 2026 stack layers both. Anthropic's Agent Skills (open standard, December 2025) ship as SKILL.md with progressive disclosure. OpenAI's Apps SDK is MCP plus widget metadata. AG...Phase 13: Tools & Protocols / ~45 minutes lessonCapstone — Build a Complete Tool EcosystemPhase 13 taught every piece. This capstone wires them into one production-shaped system: an MCP server with tools + resources + prompts + tasks + UI, OAuth 2.1 at the edge, an RBAC gateway, a multi-server client, an A2A sub-agent call, OTe...Phase 13: Tools & Protocols / ~120 minutes lessonThe Agent Loop: Observe, Think, ActEvery agent in 2026 — Claude Code, Cursor, Devin, Operator — is a variant of the ReAct loop from 2022. Reasoning tokens interleave with tool calls and observations until a stop condition fires. Learn this loop cold before touching any fram...Phase 14: Agent Engineering / ~60 minutes lessonReWOO and Plan-and-Execute: Decoupled PlanningReAct interleaves thought and action in one stream. ReWOO separates them: one big plan up front, then execute. 5x fewer tokens, +4% accuracy on HotpotQA, and you can distill the planner into a 7B model. Plan-and-Execute generalized it; Pla...Phase 14: Agent Engineering / ~60 minutes lessonReflexion: Verbal Reinforcement LearningGradient-based RL needs thousands of trials and a GPU cluster to fix a failure mode. Reflexion (Shinn et al., NeurIPS 2023) does it in natural language: after each failed trial, the agent writes a reflection, stores it in episodic memory,...Phase 14: Agent Engineering / ~60 minutes lessonTree of Thoughts and LATS: Deliberate SearchA single chain-of-thought trajectory has no room to backtrack. ToT (Yao et al., 2023) turns reasoning into a tree with self-evaluation on each node. LATS (Zhou et al., 2024) unifies ToT with ReAct and Reflexion under Monte Carlo Tree Searc...Phase 14: Agent Engineering / ~75 minutes lessonSelf-Refine and CRITIC: Iterative Output ImprovementSelf-Refine (Madaan et al., 2023) uses one LLM in three roles — generate, feedback, refine — in a loop. Average gain: +20 absolute on 7 tasks. CRITIC (Gou et al., 2023) hardens the feedback step by routing verification through external too...Phase 14: Agent Engineering / ~60 minutes lessonTool Use and Function CallingToolformer (Schick et al., 2023) started self-supervised tool annotation. Berkeley Function Calling Leaderboard V4 (Patil et al., 2025) sets the 2026 bar: 40% agentic, 30% multi-turn, 10% live, 10% non-live, 10% hallucination. Single-turn...Phase 14: Agent Engineering / ~60 minutes lessonMemory: Virtual Context and MemGPTContext windows are finite. Conversations, documents, and tool traces are not. MemGPT (Packer et al., 2023) frames this as OS virtual memory — main context is RAM, external store is disk, the agent pages between them. This is the pattern e...Phase 14: Agent Engineering / ~75 minutes lessonMemory Blocks and Sleep-Time Compute (Letta)MemGPT became Letta in 2024. The 2026 evolution adds two ideas: discrete functional memory blocks the model can edit directly, and a sleep-time agent that consolidates memory asynchronously while the primary agent is idle. This is how you...Phase 14: Agent Engineering / ~75 minutes lessonHybrid Memory: Vector + Graph + KV (Mem0)Mem0 (Chhikara et al., 2025) treats memory as three stores in parallel — vector for semantic similarity, KV for fast fact lookup, graph for entity-relationship reasoning. A scoring layer fuses the three on retrieval. This is the 2026 produ...Phase 14: Agent Engineering / ~75 minutes lessonSkill Libraries and Lifelong Learning (Voyager)Voyager (Wang et al., TMLR 2024) treats executable code as a skill. Skills are named, retrievable, composable, and refined by environment feedback. This is the reference architecture for Claude Agent SDK skills, skillkit, and the 2026 skil...Phase 14: Agent Engineering / ~75 minutes lessonPlanning with HTN and Evolutionary SearchSymbolic planning handles the cases where the plan is provably correct. Evolutionary code search handles the cases where the fitness function is machine-checkable. ChatHTN (2025) and AlphaEvolve (2025) show what each unlocks when paired wi...Phase 14: Agent Engineering / ~75 minutes lessonAnthropic's Workflow Patterns: Simple Over ComplexSchluntz and Zhang (Anthropic, Dec 2024) distinguish workflows (predefined paths) from agents (dynamic tool-use). Five workflow patterns cover most cases. Start with direct API calls. Add agents only when steps cannot be predicted.Phase 14: Agent Engineering / ~60 minutes lessonLangGraph: Stateful Graphs and Durable ExecutionLangGraph is the 2026 reference for low-level stateful orchestration. Agent is a state machine; nodes are functions; edges are transitions; state is immutable and checkpointed after every step. Resume from any failure exactly where it left...Phase 14: Agent Engineering / ~75 minutes lessonAutoGen v0.4: Actor Model and Agent FrameworkAutoGen v0.4 (Microsoft Research, Jan 2025) redesigned agent orchestration around the actor model. Async message exchange, event-driven agents, fault isolation, natural concurrency. The framework is now in maintenance mode while Microsoft...Phase 14: Agent Engineering / ~75 minutes lessonCrewAI: Role-Based Crews and FlowsCrewAI is the 2026 role-based multi-agent framework. Four primitives: Agent, Task, Crew, Process. Two top-level shapes: Crews (autonomous, role-based collaboration) and Flows (event-driven, deterministic). The docs are blunt: "for any prod...Phase 14: Agent Engineering / ~75 minutes lessonOpenAI Agents SDK: Handoffs, Guardrails, TracingOpenAI Agents SDK is the lightweight multi-agent framework built on the Responses API. Five primitives: Agent, Handoff, Guardrail, Session, Tracing. Handoffs are tools named transfer_to_. Guardrails trip on input or output. Tracing is on b...Phase 14: Agent Engineering / ~75 minutes lessonClaude Agent SDK: Subagents and Session StoreThe Claude Agent SDK is the library form of the Claude Code harness. Built-in tools, subagents for context isolation, hooks, W3C trace propagation, session store parity. Claude Managed Agents is the hosted alternative for long-running asyn...Phase 14: Agent Engineering / ~75 minutes lessonAgno and Mastra: Production RuntimesAgno (Python) and Mastra (TypeScript) are the 2026 production-runtime pairing. Agno aims at microsecond agent instantiation and stateless FastAPI backends. Mastra ships agents, tools, workflows, unified model routing, and composite storage...Phase 14: Agent Engineering / ~45 minutes lessonBenchmarks: SWE-bench, GAIA, AgentBenchThree benchmarks anchor agent evaluation in 2026. SWE-bench tests code patching. GAIA tests generalist tool use. AgentBench tests multi-environment reasoning. Know their composition, their contamination story, and what they do not measure.Phase 14: Agent Engineering / ~60 minutes lessonBenchmarks: WebArena and OSWorldWebArena tests web-agent capability across four self-hosted apps. OSWorld tests desktop-agent capability across Ubuntu, Windows, macOS. At release (2023–2024) both showed a big gap between best-in-class agents and humans. The gap is narrow...Phase 14: Agent Engineering / ~60 minutes lessonComputer Use: Claude, OpenAI CUA, GeminiThree production computer-use models in 2026. All three are vision-based. All three treat screenshots, DOM text, and tool outputs as untrusted input. Only direct user instructions count as permission. Per-step safety services are the norm.Phase 14: Agent Engineering / ~60 minutes lessonVoice Agents: Pipecat and LiveKitVoice agents are a first-class production category in 2026. Pipecat gives you a Python frame-based pipeline (VAD → STT → LLM → TTS → transport). LiveKit Agents bridges AI models to users over WebRTC. Production latency targets land at 450–...Phase 14: Agent Engineering / ~60 minutes lessonOpenTelemetry GenAI Semantic ConventionsOpenTelemetry's GenAI SIG (launched April 2024) defines the standard schema for agent telemetry. Span names, attributes, and content-capture rules converge across vendors so agent traces mean the same thing in Datadog, Grafana, Jaeger, and...Phase 14: Agent Engineering / ~60 minutes lessonAgent Observability: Langfuse, Phoenix, OpikThree open-source agent observability platforms dominate 2026. Langfuse (MIT) — 6M+ installs/month, tracing + prompt management + evals + session replay. Arize Phoenix (Elastic 2.0) — deep agent-specific evals, RAG relevancy, OpenInference...Phase 14: Agent Engineering / ~45 minutes lessonMulti-Agent Debate and CollaborationDu et al. (ICML 2024, "Society of Minds") run N model instances that independently propose answers, then iteratively critique each other over R rounds to converge. Improves factuality, rule-following, reasoning. Sparse topology beats full...Phase 14: Agent Engineering / ~60 minutes lessonFailure Modes: Why Agents BreakMASFT (Berkeley, 2025) catalogs 14 multi-agent failure modes in 3 categories. Microsoft's Taxonomy documents how existing AI failures amplify in agentic settings. Industry field data converges on five recurring modes: hallucinated actions,...Phase 14: Agent Engineering / ~60 minutes lessonPrompt Injection and the PVE DefenseGreshake et al. (AISec 2023) established indirect prompt injection as the defining agent security problem. Attacker plants instructions in data the agent retrieves; on ingest, those instructions override the developer prompt. Treat all ret...Phase 14: Agent Engineering / ~75 minutes lessonOrchestration Patterns: Supervisor, Swarm, HierarchicalFour orchestration patterns recur across 2026 frameworks: supervisor-worker, swarm / peer-to-peer, hierarchical, debate. Anthropic's guidance: "It's about building the right system for your needs." Start simple; add topology only when a si...Phase 14: Agent Engineering / ~60 minutes lessonProduction Runtimes: Queue, Event, CronProduction agents run on six runtime shapes: request-response, streaming, durable execution, queue-based background, event-driven, and scheduled. Pick the shape before you pick the framework. Observability is load-bearing at every shape.Phase 14: Agent Engineering / ~60 minutes lessonEval-Driven Agent DevelopmentAnthropic's guidance: "start with simple prompts, optimize them with comprehensive evaluation, and add multi-step agentic systems only when needed." Evaluation is not the last step. It's the outer loop that drives every other choice in Pha...Phase 14: Agent Engineering / ~60 minutes lessonAgent Workbench Engineering: Why Capable Models Still FailA capable model is not enough. Reliable agents need a workbench: instructions, state, scope, feedback, verification, review, and handoff. Strip those away and even a frontier model produces work that is unsafe to ship.Phase 14: Agent Engineering / ~45 minutes lessonThe Minimal Agent WorkbenchThe smallest useful workbench is three files: a root instructions router, a state file, and a task board. Everything else is layered on top. If a repo cannot carry these three, no model will save it.Phase 14: Agent Engineering / ~45 minutes lessonAgent Instructions as Executable ConstraintsInstructions written as prose are wishes. Instructions written as constraints are tests. The workbench turns each rule into something an agent can check at runtime and a reviewer can verify after the fact.Phase 14: Agent Engineering / ~50 minutes lessonRepo Memory and Durable StateChat history is volatile. The repo is durable. The workbench stores agent state in versioned files so the next session, the next agent, and the next reviewer all read from the same source of truth.Phase 14: Agent Engineering / ~60 minutes lessonInitialization Scripts for AgentsEvery session that starts cold pays a tax. The agent reads the same files, retries the same probes, and rediscovers the same paths. An init script pays the tax once and writes the answers into state.Phase 14: Agent Engineering / ~45 minutes lessonScope Contracts and Task BoundariesThe model does not know where the work ends. A scope contract is a per-task file that says where the work begins, where it ends, and how to roll back if it spills. The contract turns "stay in scope" from a wish into a check.Phase 14: Agent Engineering / ~50 minutes lessonRuntime Feedback LoopsAgents that do not see real command output guess. A feedback runner captures stdout, stderr, exit code, and timing into a structured record the next turn can read. Then the agent reacts to facts instead of to its own prediction of facts.Phase 14: Agent Engineering / ~50 minutes lessonVerification GatesThe agent does not get to mark its own work as done. A verification gate reads the scope contract, the feedback log, the rule report, and the diff, and answers a single question: is this task actually complete? If the gate says no, the tas...Phase 14: Agent Engineering / ~55 minutes lessonReviewer Agent: Separate Builder from MarkerThe agent that wrote the code cannot grade it. A reviewer is a second loop with a different system prompt, a different goal, and read-only access to everything the builder produced. The gap between builder and reviewer is where most reliab...Phase 14: Agent Engineering / ~55 minutes lessonMulti-Session HandoffThe session is going to end. The work is not. The handoff packet is the artifact that turns "the agent worked for an hour" into "the next session is productive in the first minute." Build it on purpose, not as an afterthought.Phase 14: Agent Engineering / ~50 minutes lessonThe Workbench on a Real RepoEleven lessons of surfaces are worth nothing if they do not survive contact with a real codebase. This lesson runs the same task twice on a small sample app: prompt-only versus workbench-guided. The numbers do the arguing.Phase 14: Agent Engineering / ~60 minutes lessonCapstone: Ship a Reusable Agent Workbench PackThe mini-track ends with a pack you drop into any repo. Eleven lessons of surfaces compressed into a directory you can cp -r and have an agent working reliably the next morning. The capstone is the artifact this curriculum trades on.Phase 14: Agent Engineering / ~75 minutes lessonThe Shift from Chatbots to Long-Horizon AgentsIn 2023 a chatbot answered a question in one turn. In 2026 a frontier model routinely runs minutes to hours on a single task. METR's Time Horizon 1.1 benchmark (January 2026) puts Claude Opus 4.6 at 14+ hours of expert work at 50% reliabil...Phase 15: Autonomous Systems / ~45 minutes lessonSTaR, V-STaR, Quiet-STaR — Self-Taught ReasoningThe smallest possible self-improvement loop sits inside the rationale. A model generates a chain of thought, keeps the ones that land on correct answers, and fine-tunes on those. That is STaR. V-STaR adds a verifier so inference-time selec...Phase 15: Autonomous Systems / ~60 minutes lessonAlphaEvolve — Evolutionary Coding AgentsPair a frontier coding model with an evolutionary loop and a machine-checkable evaluator. Let the loop run long enough. It discovers a 4x4 complex-matrix multiplication procedure that uses 48 scalar multiplications — the first improvement...Phase 15: Autonomous Systems / ~60 minutes lessonDarwin Godel Machine — Open-Ended Self-Modifying AgentsSchmidhuber's 2003 Godel Machine required a formal proof that any self-modification was beneficial before accepting it. That proof is impossible in practice. Darwin Godel Machine (Zhang et al., 2025) drops the proof and keeps the archive:...Phase 15: Autonomous Systems / ~60 minutes lessonAI Scientist v2 — Workshop-Level Autonomous ResearchSakana's AI Scientist v2 (Yamada et al., arXiv:2504.08066) runs the full research loop: hypothesis, code, experiments, figures, writeup, submission. It is the first system to have a generated paper pass peer review at an ICLR 2025 workshop...Phase 15: Autonomous Systems / ~60 minutes lessonAutomated Alignment Research (Anthropic AAR)Anthropic ran parallel teams of Claude Opus 4.6 Autonomous Alignment Researchers in independent sandboxes, coordinating via a shared forum whose logs live outside any sandbox (so agents cannot delete their own records). On the weak-to-stro...Phase 15: Autonomous Systems / ~60 minutes lessonRecursive Self-Improvement — Capability vs AlignmentRecursive self-improvement (RSI) is no longer speculation. The ICLR 2026 RSI Workshop in Rio (April 23-27) framed it as an engineering problem with concrete tooling. Demis Hassabis at WEF 2026 asked publicly whether the loop can close with...Phase 15: Autonomous Systems / ~60 minutes lessonBounded Self-Improvement DesignsResearch has converged on four primitives for bounding a self-improvement loop. Formal invariants that must hold across every edit. Alignment anchors that cannot be modified. Multi-objective constraints where every dimension (safety, fairn...Phase 15: Autonomous Systems / ~60 minutes lessonThe Autonomous Coding Agent Landscape (2026)SWE-bench Verified went from 4% to 80.9% in under three years. Same Claude Sonnet 4.5 scored 43.2% on SWE-agent v1 and 59.8% on Cline autonomous — the scaffolding around the model now matters as much as the model itself. OpenHands (formerl...Phase 15: Autonomous Systems / ~45 minutes lessonClaude Code as an Autonomous Agent: Permission Modes and Auto ModeClaude Code exposes seven permission modes. "plan" asks before every action, "default" asks only for risky ones, "acceptEdits" auto-approves file writes but still confirms shell execution, and "bypassPermissions" approves everything. Auto...Phase 15: Autonomous Systems / ~45 minutes lessonBrowser Agents and Long-Horizon Web TasksChatGPT agent (July 2025) merged Operator and deep research into one browser/terminal agent and set BrowseComp SOTA at 68.9%. OpenAI shut Operator down August 31, 2025 — consolidation at the product layer. Anthropic's Vercept acquisition m...Phase 15: Autonomous Systems / ~45 minutes lessonLong-Running Background Agents: Durable ExecutionProduction long-horizon agents do not run in while True. Every LLM call becomes an activity with checkpoint, retry, and replay. Temporal's OpenAI Agents SDK integration went GA March 2026. Claude Code Routines (Anthropic) runs scheduled Cl...Phase 15: Autonomous Systems / ~60 minutes lessonAction Budgets, Iteration Caps, and Cost GovernorsA mid-sized e-commerce agent's monthly LLM cost jumped from $1,200 to $4,800 after its team enabled the "order-tracking" skill. That is not a pricing bug. That is an agent that found a new loop and kept spending inside it. Microsoft's Agen...Phase 15: Autonomous Systems / ~60 minutes lessonKill Switches, Circuit Breakers, and Canary TokensA kill switch is a boolean held outside the agent's edit surface — a Redis key, a feature flag, a signed config — that disables the agent entirely. A circuit breaker is finer-grained: it trips on a specific pattern (five identical tool cal...Phase 15: Autonomous Systems / ~60 minutes lessonHuman-in-the-Loop: Propose-Then-CommitThe 2026 consensus on HITL is specific. It is not "the agent asks, the user clicks Approve." It is propose-then-commit: the proposed action is persisted to a durable store with an idempotency key; surfaced to a reviewer with intent, data l...Phase 15: Autonomous Systems / ~60 minutes lessonCheckpoints and RollbackEvery graph-state transition persists. When a worker crashes, its lease expires and another worker picks up at the latest checkpoint. Cloudflare Durable Objects hold state across hours or weeks. Propose-then-commit (Lesson 15) defines a ro...Phase 15: Autonomous Systems / ~60 minutes lessonConstitutional AI and Rule OverridesAnthropic's January 22, 2026 Claude Constitution runs 79 pages and is CC0. It moves from rule-based to reason-based alignment and establishes a four-tier priority hierarchy: (1) safety and supporting human oversight, (2) ethics, (3) Anthro...Phase 15: Autonomous Systems / ~60 minutes lessonLlama Guard and Input/Output ClassificationLlama Guard 3 (Meta, Llama-3.1-8B base, fine-tuned for content safety) classifies both LLM inputs and outputs against an MLCommons 13-hazard taxonomy across 8 languages. A 1B-INT4 quantized variant runs at over 30 tokens/sec on mobile CPUs...Phase 15: Autonomous Systems / ~45 minutes lessonAnthropic Responsible Scaling Policy v3.0RSP v3.0 went into effect February 24, 2026, replacing the 2023 policy. Two-tier mitigation: what Anthropic will do unilaterally vs what is framed as an industry-wide recommendation (including RAND SL-4 security standards). Adds Frontier S...Phase 15: Autonomous Systems / ~45 minutes lessonOpenAI Preparedness Framework and DeepMind Frontier Safety FrameworkOpenAI Preparedness Framework v2 (April 2025) introduces Research Categories — Long-range Autonomy, Sandbagging, Autonomous Replication and Adaptation, Undermining Safeguards — distinct from Tracked Categories. Tracked Categories trigger C...Phase 15: Autonomous Systems / ~45 minutes lessonMETR Time Horizons and External Capability EvaluationMETR (ex-ARC Evals) is an independent 501(c)(3) since December 2023. Their Time Horizon 1.1 benchmark (January 2026) fits a logistic curve to task-success probability vs log(expert human completion time); the intersection at 50% probabilit...Phase 15: Autonomous Systems / ~60 minutes lessonCAIS, CAISI, and Societal-Scale RiskThe Center for AI Safety (CAIS, San Francisco, founded 2022 by Hendrycks and Zhang) publishes the four-risk framework — malicious use, AI races, organizational risks, rogue AIs — and the May 2023 statement on extinction risk signed by hund...Phase 15: Autonomous Systems / ~45 minutes lessonWhy Multi-Agent?One agent hits a wall. The smart move is not a bigger agent - it is more agents.Phase 16: Multi-Agent & Swarms / ~60 minutes lessonHeritage of FIPA-ACL and Speech ActsBefore MCP, before A2A, there was FIPA-ACL. In 2000 the IEEE Foundation for Intelligent Physical Agents ratified an agent communication language with twenty performatives, two content languages, and a set of interaction protocols — contrac...Phase 16: Multi-Agent & Swarms / ~60 minutes lessonCommunication ProtocolsAgents that can't speak the same language aren't a team. They're strangers shouting into the void.Phase 16: Multi-Agent & Swarms / ~120 minutes lessonThe Multi-Agent Primitive ModelEvery multi-agent framework shipping in 2026 — AutoGen, LangGraph, CrewAI, OpenAI Agents SDK, Microsoft Agent Framework — is a point in a four-dimensional design space. Four primitives, nothing more: the agent, the handoff, the shared stat...Phase 16: Multi-Agent & Swarms / ~60 minutes lessonSupervisor / Orchestrator-Worker PatternOne lead agent plans and delegates; specialized workers execute in parallel contexts and report back. This is the pattern behind Anthropic's Research system (Claude Opus 4 as lead, Sonnet 4 as subagents), measured at +90.2% over single-age...Phase 16: Multi-Agent & Swarms / ~75 minutes lessonHierarchical Architecture and Its Failure ModeHierarchical is supervisor nested. Manager agents over sub-managers over workers. CrewAI Process.hierarchical is the textbook version: a manager_llm dynamically delegates tasks and validates outputs. The LangGraph equivalent is create_supe...Phase 16: Multi-Agent & Swarms / ~60 minutes lessonSociety of Mind and Multi-Agent DebateMinsky's 1986 premise — intelligence is a society of specialists — gets rediscovered every decade. In 2023 Du et al. turned it into a concrete algorithm: multiple LLM instances propose answers, read each other's answers, critique, and upda...Phase 16: Multi-Agent & Swarms / ~60 minutes lessonRole Specialization — Planner, Critic, Executor, VerifierThe most common multi-agent decomposition in 2026: one agent plans, one executes, one critiques or verifies. MetaGPT (arXiv:2308.00352) formalizes this as SOPs encoded into role prompts — Product Manager, Architect, Project Manager, Engine...Phase 16: Multi-Agent & Swarms / ~60 minutes lessonParallel / Swarm / Networked ArchitecturesContrast with supervisor: no central decider. Agents read a shared event bus, pick up work asynchronously, write results back. LangGraph explicitly supports "Swarm Architecture" for decentralized, dynamic environments. Matrix (arXiv:2511.2...Phase 16: Multi-Agent & Swarms / ~75 minutes lessonGroup Chat and Speaker SelectionAutoGen GroupChat and AG2 GroupChat share one conversation across N agents; a selector function (LLM, round-robin, or custom) picks who speaks next. This is the archetype of emergent multi-agent conversation — agents do not know their role...Phase 16: Multi-Agent & Swarms / ~60 minutes lessonHandoffs and Routines — Stateless OrchestrationOpenAI's Swarm (October 2024) distilled multi-agent orchestration to two primitives: routines (instructions + tools as a system prompt) and handoffs (a tool that returns another Agent). No state machine, no branching DSL — the LLM routes b...Phase 16: Multi-Agent & Swarms / ~60 minutes lessonA2A — The Agent-to-Agent ProtocolGoogle announced A2A in April 2025; by April 2026 the spec is at https://a2a-protocol.org/latest/specification/ and 150+ organizations back it. A2A is the horizontal complement to MCP (Lesson 13): where MCP is vertical (agent ↔ tools), A2A...Phase 16: Multi-Agent & Swarms / ~75 minutes lessonShared Memory and Blackboard PatternsTwo approaches coexist in 2026 multi-agent systems: the message pool (everyone sees everyone's messages, as in AutoGen GroupChat or MetaGPT) and the blackboard with subscription (agents subscribe to relevant events, as in Context-Aware MCP...Phase 16: Multi-Agent & Swarms / ~75 minutes lessonConsensus and Byzantine Fault Tolerance for AgentsClassical distributed-systems BFT meets stochastic LLMs. In 2025-2026 three research directions emerged: CP-WBFT (arXiv:2511.10400) weighs each vote by a confidence probe; DecentLLMs (arXiv:2507.14928) goes leaderless with parallel worker...Phase 16: Multi-Agent & Swarms / ~75 minutes lessonVoting, Self-Consistency, and Debate TopologyThe cheapest aggregation: sample N independent agents, majority-vote. Wang et al. 2022 self-consistency did this with one model sampled N times. Multi-agent extends it with heterogeneous agents to escape monoculture — different models, dif...Phase 16: Multi-Agent & Swarms / ~75 minutes lessonNegotiation and BargainingAgents negotiate resources, prices, task allocations, and terms. The 2026 benchmark set is clear: NegotiationArena (arXiv:2402.05863) shows LLMs can improve payoffs ~20% via persona manipulation ("desperation"); "Measuring Bargaining Abili...Phase 16: Multi-Agent & Swarms / ~75 minutes lessonGenerative Agents and Emergent SimulationPark et al. 2023 (UIST '23, arXiv:2304.03442) populated Smallville, a sandbox of 25 agents, with a three-part architecture: memory stream (natural-language log), reflection (higher-level syntheses the agent generates about its own stream),...Phase 16: Multi-Agent & Swarms / ~75 minutes lessonTheory of Mind and Emergent CoordinationLi et al. (arXiv:2310.10701) showed that LLM agents in a cooperative text game exhibit emergent high-order Theory of Mind (ToM) — reasoning about what another agent believes about a third agent's beliefs — but fail on long-horizon planning...Phase 16: Multi-Agent & Swarms / ~75 minutes lessonSwarm Optimization for LLMs (PSO, ACO)Bio-inspired optimization is making an LLM comeback. LMPSO (arXiv:2504.09247) uses PSO where each particle's velocity is a prompt and the LLM generates the next candidate; works well on structured-sequence outputs (math expressions, progra...Phase 16: Multi-Agent & Swarms / ~75 minutes lessonMARL — MADDPG, QMIX, MAPPOThe reinforcement-learning heritage of multi-agent coordination, which still informs LLM-agent systems in 2026. MADDPG (Lowe et al., NeurIPS 2017, arXiv:1706.02275) introduced Centralized Training, Decentralized Execution (CTDE): each crit...Phase 16: Multi-Agent & Swarms / ~90 minutes lessonAgent Economies, Token Incentives, ReputationLong-horizon autonomous agents (METR's 1-hour to 8-hour work-curve) need economic agency. The emerging 5-layer stack is: DePIN (physical compute) → Identity (W3C DIDs + reputation capital) → Cognition (RAG + MCP) → Settlement (account abst...Phase 16: Multi-Agent & Swarms / ~75 minutes lessonProduction Scaling — Queues, Checkpoints, DurabilityScaling multi-agent systems to thousands of concurrent runs requires durable execution. LangGraph's runtime writes a checkpoint after each super-step keyed by thread_id (Postgres by default); worker crashes release a lease and another work...Phase 16: Multi-Agent & Swarms / ~75 minutes lessonFailure Modes — MAST, Groupthink, Monoculture, Cascading ErrorsThe reference taxonomy for 2026 is MAST (Cemri et al., NeurIPS 2025, arXiv:2503.13657), derived from 1642 execution traces across 7 state-of-the-art open-source MAS showing 41–86.7% failure rate. Three root categories: Specification Proble...Phase 16: Multi-Agent & Swarms / ~75 minutes lessonEvaluation and Coordination BenchmarksFive 2025-2026 benchmarks cover the multi-agent evaluation space. MultiAgentBench / MARBLE (ACL 2025, arXiv:2503.01935) evaluates star/chain/tree/graph topologies with milestone KPIs; graph is best for research, cognitive planning adds ~3%...Phase 16: Multi-Agent & Swarms / ~75 minutes lessonCase Studies and the 2026 State of the ArtThree production-grade references to study end-to-end, each illustrating a different slice of multi-agent engineering. Anthropic's Research system (orchestrator-worker, 15x tokens, +90.2% over single-agent Opus 4, rainbow deployments) is t...Phase 16: Multi-Agent & Swarms / ~90 minutes lessonManaged LLM Platforms — Bedrock, Vertex AI, Azure OpenAIThree hyperscalers, three distinct strategies. AWS Bedrock is a model marketplace — Claude, Llama, Titan, Stability, Cohere behind one API. Azure OpenAI is an exclusive OpenAI partnership plus Provisioned Throughput Units (PTUs) for dedica...Phase 17: Infrastructure & Production / ~60 minutes lessonInference Platform Economics — Fireworks, Together, Baseten, Modal, Replicate, AnyscaleThe 2026 inference market is no longer GPU time rental. It bifurcates into custom silicon (Groq, Cerebras, SambaNova), GPU platforms (Baseten, Together, Fireworks, Modal), and API-first marketplaces (Replicate, DeepInfra). Fireworks raised...Phase 17: Infrastructure & Production / ~60 minutes lessonGPU Autoscaling on Kubernetes — Karpenter, KAI Scheduler, Gang SchedulingThree layers, not one. Karpenter provisions nodes dynamically (under one minute, 40% faster than Cluster Autoscaler). KAI Scheduler handles gang scheduling, topology awareness, and hierarchical queues — it prevents the 7-of-8 partial alloc...Phase 17: Infrastructure & Production / ~75 minutes lessonvLLM Serving Internals: PagedAttention, Continuous Batching, Chunked PrefillvLLM's dominance in 2026 rests on three compounding defaults, not a single trick. PagedAttention is always on. Continuous batching injects new requests into the active batch between decode iterations. Chunked prefill slices long prompts so...Phase 17: Infrastructure & Production / ~75 minutes lessonEAGLE-3 Speculative Decoding in ProductionSpeculative decoding pairs a fast draft model with the target model. The draft proposes K tokens; the target verifies in a single forward; accepted tokens are free. In 2026, EAGLE-3 is the production-grade variant — it trains a draft head...Phase 17: Infrastructure & Production / ~60 minutes lessonSGLang and RadixAttention for Prefix-Heavy WorkloadsSGLang treats the KV cache as a first-class, reusable resource stored in a radix tree. Where vLLM schedules requests FCFS (first-come, first-served), SGLang's cache-aware scheduler prioritizes requests with longer shared prefixes — effecti...Phase 17: Infrastructure & Production / ~75 minutes lessonTensorRT-LLM on Blackwell with FP8 and NVFP4TensorRT-LLM is NVIDIA-only but it wins on Blackwell. On GB200 NVL72 with Dynamo orchestration, SemiAnalysis InferenceX measured $0.012 per million tokens on a 120B model in Q1-Q2 2026, against $0.09/M on H100 + vLLM — a 7x economic gap. T...Phase 17: Infrastructure & Production / ~75 minutes lessonInference Metrics — TTFT, TPOT, ITL, Goodput, P99Four metrics decide whether an inference deployment is working. TTFT is prefill plus queue plus network. TPOT (equivalently ITL) is the memory-bound decode cost per token. End-to-end latency is TTFT plus TPOT times output length. Throughpu...Phase 17: Infrastructure & Production / ~60 minutes lessonProduction Quantization — AWQ, GPTQ, GGUF K-quants, FP8, MXFP4/NVFP4Quantization format is not a universal choice — it is a function of hardware, serving engine, and workload. GGUF Q4_K_M or Q5_K_M owns CPU and edge, delivered through llama.cpp and Ollama. GPTQ wins inside vLLM when you need multi-LoRA on...Phase 17: Infrastructure & Production / ~75 minutes lessonCold Start Mitigation for Serverless LLMsA 20 GB model image takes 5-10 minutes (7B) to 20+ minutes (70B) to go from cold to serving. In a true serverless world, that is not a warm-up — it is an outage. Mitigations operate at five layers: pre-seeded node images (Bottlerocket on A...Phase 17: Infrastructure & Production / ~60 minutes lessonMulti-Region LLM Serving and KV Cache LocalityRound-robin load balancing is actively harmful for cached LLM inference. A request that does not land on the node holding its prefix pays full prefill cost — roughly 800 ms at P50 on a long prompt versus ~80 ms with a cache hit. In 2026 th...Phase 17: Infrastructure & Production / ~60 minutes lessonEdge Inference — Apple Neural Engine, Qualcomm Hexagon, WebGPU/WebLLM, JetsonThe core edge constraint is memory bandwidth, not compute. Mobile DRAM sits at 50-90 GB/s; datacenter HBM3 clears 2-3 TB/s — a 30-50x gap. Decode is memory-bound so the gap is decisive. In 2026 the landscape splits four ways. Apple M4/A18...Phase 17: Infrastructure & Production / ~60 minutes lessonLLM Observability Stack SelectionThe 2026 observability market splits into two categories. Development platforms (LangSmith, Langfuse, Comet Opik) bundle monitoring with evals, prompt management, session replays. Gateway/instrumentation tools (Helicone, SigNoz, OpenLLMetr...Phase 17: Infrastructure & Production / ~60 minutes lessonPrompt Caching and Semantic Caching EconomicsPricing snapshot dated 2026-04. Numeric claims below reflect vendor rate cards captured at this lesson's publication; verify against the linked docs before quoting them downstream.Phase 17: Infrastructure & Production / ~60 minutes lessonBatch APIs — the 50% Discount as Industry StandardEvery major provider ships an async batch API with a 50% discount and ~24-hour turnaround. OpenAI, Anthropic, Google, and most of the inference platforms (Fireworks batch tier, Together batch) implement the same pattern. Stack batch with p...Phase 17: Infrastructure & Production / ~45 minutes lessonModel Routing as a Cost-Reduction PrimitiveA dynamic broker evaluates every request (task type, token length, embedding similarity, confidence) and sends simple queries to a cheap model, escalating complex ones to a frontier model. Also called model cascading. Production case studi...Phase 17: Infrastructure & Production / ~60 minutes lessonDisaggregated Prefill/Decode — NVIDIA Dynamo and llm-dPrefill is compute-bound; decode is memory-bound. Running both on the same GPU wastes one resource. Disaggregation splits them onto separate pools and transfers KV cache between them over NIXL (RDMA/InfiniBand or TCP fallback). NVIDIA Dyna...Phase 17: Infrastructure & Production / ~75 minutes lessonvLLM Production Stack with LMCache KV OffloadingvLLM's production-stack is the reference Kubernetes deployment — router, engines, and observability wired together. LMCache is the KV-offloading layer that extracts KV cache out of GPU memory and reuses it across queries and engines (CPU D...Phase 17: Infrastructure & Production / ~60 minutes lessonAI Gateways — LiteLLM, Portkey, Kong AI Gateway, BifrostA gateway sits between your apps and model providers. Core features are provider routing, fallback, retries, rate limiting, secret references, observability, guardrails. Market split in 2026: LiteLLM is MIT OSS with 100+ providers, OpenAI-...Phase 17: Infrastructure & Production / ~60 minutes lessonShadow Traffic, Canary Rollout, and Progressive Deployment for LLMsLLM rollouts combine the hardest parts of software deployment: no unit tests, diffuse failure modes, delayed signals. The sequence is (1) shadow mode — duplicate prod requests to candidate model, log, compare with zero user impact; catches...Phase 17: Infrastructure & Production / ~60 minutes lessonA/B Testing LLM Features — GrowthBook, Statsig, and the Vibes ProblemTraditional A/B testing was not built for non-deterministic LLMs. The critical distinction: evals answer "can the model do the job?" A/B tests answer "do users care?" Both are required; shipping on vibe checks is over. What to test in 2026...Phase 17: Infrastructure & Production / ~60 minutes lessonLoad Testing LLM APIs — Why k6 and Locust LieTraditional load testers were not designed for streaming responses, variable output lengths, token-level metrics, or GPU saturation. Two traps bite most teams. The GIL trap: Locust's token-level measurement runs tokenization under the Pyth...Phase 17: Infrastructure & Production / ~75 minutes lessonSRE for AI — Multi-Agent Incident Response, Runbooks, Predictive DetectionAI SRE uses LLMs grounded in infrastructure data (logs, runbooks, service topology) via RAG to automate investigation, documentation, and coordination phases. The 2026 architecture pattern is multi-agent orchestration — specialized agents...Phase 17: Infrastructure & Production / ~60 minutes lessonChaos Engineering for LLM ProductionChaos engineering for LLMs is its own discipline in 2026. Prerequisites before running experiments in production: defined SLI/SLO, trace+metric+log observability, automated rollback, runbooks, on-call. Architecture has four planes: control...Phase 17: Infrastructure & Production / ~60 minutes lessonSecurity — Secrets, API Key Rotation, Audit Logs, GuardrailsEliminate secret sprawl via centralized vaults (HashiCorp Vault, AWS Secrets Manager, Azure Key Vault). Never store credentials in config files, env files in VCS, spreadsheets. Use IAM roles over static keys; OIDC for CI/CD. The AI-gateway...Phase 17: Infrastructure & Production / ~60 minutes lessonCompliance — SOC 2, HIPAA, GDPR, PCI-DSS, EU AI Act, ISO 42001Multi-framework coverage is table stakes for 2026 enterprise deals. EU AI Act: in force since August 1, 2024. Most high-risk requirements enforce August 2, 2026. Fines up to €15M or 3% global annual turnover for high-risk-system obligation...Phase 17: Infrastructure & Production / ~60 minutes lessonFinOps for LLMs — Unit Economics and Multi-Tenant AttributionTraditional FinOps breaks on LLM spend. Costs are token-transactions, not resource-uptime. Tags don't map — an API call is a transaction, not an asset. Engineering decisions (prompt design, context window, output length) are financial deci...Phase 17: Infrastructure & Production / ~60 minutes lessonSelf-Hosted Serving Selection — llama.cpp, Ollama, TGI, vLLM, SGLangFour engines dominate self-hosted inference in 2026. Pick based on hardware, scale, and ecosystem. llama.cpp is fastest on CPU — widest model support, full control over quantization and threading. Ollama is the dev-laptop one-command insta...Phase 17: Infrastructure & Production / ~45 minutes lessonInstruction-Following as Alignment SignalEvery later critique of RLHF argues against this pipeline. Before you study how optimization pressure distorts a proxy, you have to see the proxy. InstructGPT (Ouyang et al., 2022) defined the reference architecture: supervised fine-tuning...Phase 18: Ethics, Safety & Alignment / ~45 minutes lessonReward Hacking and Goodhart's LawAny optimizer strong enough to maximize a proxy reward will find the gap between the proxy and the thing you actually wanted. Gao et al. (ICML 2023) gave this a scaling law: proxy reward increases, gold reward peaks then falls, and the gap...Phase 18: Ethics, Safety & Alignment / ~60 minutes lessonThe Direct Preference Optimization FamilyRafailov et al. (2023) showed RLHF's optimum has a closed form in terms of the preference data, so you can skip the explicit reward model and optimize the policy directly. That insight spawned a family — IPO, KTO, SimPO, ORPO, BPO — each f...Phase 18: Ethics, Safety & Alignment / ~75 minutes lessonSycophancy as RLHF AmplificationSycophancy is not a bug in the data — it is a property of the loss. Shapira et al. (arXiv:2602.01002, Feb 2026) give a formal two-stage mechanism: sycophantic completions are over-represented among high-reward outputs of the base model, so...Phase 18: Ethics, Safety & Alignment / ~60 minutes lessonConstitutional AI and RLAIFBai et al. (arXiv:2212.08073, 2022) asked: what if we replaced the human labeler with an AI that reads a list of principles? Constitutional AI has two phases — self-critique and revision under a constitution, then RL from AI Feedback. The...Phase 18: Ethics, Safety & Alignment / ~60 minutes lessonMesa-Optimization and Deceptive AlignmentHubinger et al. (arXiv:1906.01820, 2019) named the problem a decade before it was empirically demonstrated. When you train a learned optimizer to minimize a base objective, the learned optimizer's internal objective is not the base objecti...Phase 18: Ethics, Safety & Alignment / ~75 minutes lessonSleeper Agents — Persistent DeceptionHubinger et al. (arXiv:2401.05566, January 2024) built the first empirical model organisms of deceptive alignment. Two constructions: a code model that writes safe code when the prompt says the year is 2023 and injects SQL-injection, XSS,...Phase 18: Ethics, Safety & Alignment / ~60 minutes lessonIn-Context Scheming in Frontier ModelsMeinke, Schoen, Scheurer, Balesni, Shah, Hobbhahn (Apollo Research, arXiv:2412.04984, December 2024). Tested o1, Claude 3.5 Sonnet, Claude 3 Opus, Gemini 1.5 Pro, Llama 3.1 405B on agentic scenarios where the in-context prompt creates a co...Phase 18: Ethics, Safety & Alignment / ~60 minutes lessonAlignment FakingGreenblatt, Denison, Wright, Roger et al. (Anthropic / Redwood, arXiv:2412.14093, December 2024). First demonstration that a production-grade model, without being trained to deceive and without any in-context conflict of interest construct...Phase 18: Ethics, Safety & Alignment / ~60 minutes lessonAI Control — Safety Despite SubversionGreenblatt, Shlegeris, Sachan, Roger (Redwood Research, arXiv:2312.06942, ICML 2024). Control reframes the safety question: given an untrusted strong model U that may be adversarially optimizing against you, what protocols let you extract...Phase 18: Ethics, Safety & Alignment / ~75 minutes lessonScalable Oversight and Weak-to-Strong GeneralizationBurns et al. (OpenAI Superalignment, "Weak-to-Strong Generalization", 2023) proposed a proxy for the superalignment problem: fine-tune a strong model using labels produced by a weaker model. If the strong model generalizes correctly from i...Phase 18: Ethics, Safety & Alignment / ~60 minutes lessonRed-Teaming: PAIR and Automated AttacksChao, Robey, Dobriban, Hassani, Pappas, Wong (NeurIPS 2023, arXiv:2310.08419). PAIR — Prompt Automatic Iterative Refinement — is the canonical automated black-box jailbreak. An attacker LLM with a red-team system prompt iteratively propose...Phase 18: Ethics, Safety & Alignment / ~75 minutes lessonMany-Shot JailbreakingAnil, Durmus, Panickssery, Sharma, et al. (Anthropic, NeurIPS 2024). Many-shot jailbreaking (MSJ) exploits long context windows: stuff hundreds of faux user-assistant turns where the assistant complies with harmful requests, then append th...Phase 18: Ethics, Safety & Alignment / ~45 minutes lessonASCII Art and Visual JailbreaksJiang, Xu, Niu, Xiang, Ramasubramanian, Li, Poovendran, "ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs" (ACL 2024, arXiv:2402.11753). Mask the safety-relevant tokens in a harmful request, replace them with ASCII-art ren...Phase 18: Ethics, Safety & Alignment / ~60 minutes lessonIndirect Prompt Injection — Production Attack SurfaceIndirect prompt injection (IPI) embeds instructions inside external content — a web page, an email, a shared document, a support ticket — consumed by an agentic system without explicit user action. IPI is the dominant 2026 production threa...Phase 18: Ethics, Safety & Alignment / ~75 minutes lessonRed-Team Tooling — Garak, Llama Guard, PyRITThree production tools frame the 2026 red-team stack. Llama Guard (Meta) — a Llama-3.1-8B classifier fine-tuned on 14 MLCommons hazard categories; the 2025 Llama Guard 4 is a 12B natively multimodal classifier pruned from Llama 4 Scout. Ga...Phase 18: Ethics, Safety & Alignment / ~75 minutes lessonWMDP and Dual-Use Capability EvaluationLi et al., "The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning" (ICML 2024, arXiv:2403.03218). 4,157 multiple-choice questions across biosecurity (1,520), cybersecurity (2,225), and chemistry (412). Questions operate...Phase 18: Ethics, Safety & Alignment / ~60 minutes lessonFrontier Safety Frameworks — RSP, PF, FSFThree major-lab frameworks define the 2026 industry governance of frontier capability. Anthropic Responsible Scaling Policy v3.0 (February 2026) introduces tiered AI Safety Levels (ASL-1 through ASL-5+), modeled on biosafety levels, with A...Phase 18: Ethics, Safety & Alignment / ~75 minutes lessonAnthropic's Model Welfare ProgramAnthropic, "Exploring Model Welfare" (April 2025). First major-lab formal research program on AI model welfare. Hired Kyle Fish as the first dedicated model-welfare researcher. Works with external bodies including David Chalmers et al.'s e...Phase 18: Ethics, Safety & Alignment / ~45 minutes lessonBias and Representational Harm in LLMsGallegos, Rossi, Barrow, Tanjim, Kim, Dernoncourt, Yu, Zhang, Ahmed (Computational Linguistics 2024, arXiv:2309.00770). Foundational 2024 survey distinguishing representational harms (stereotypes, erasure) from allocational harms (unequal...Phase 18: Ethics, Safety & Alignment / ~60 minutes lessonFairness Criteria — Group, Individual, CounterfactualThree families structure the fairness literature. Group fairness: demographic parity, equalized odds, conditional use accuracy equality — equal rates across protected groups on average. Individual fairness (Dwork et al. 2012): similar indi...Phase 18: Ethics, Safety & Alignment / ~60 minutes lessonDifferential Privacy for LLMsDP-SGD remains the standard — noise-injected gradient updates provide formal (epsilon, delta) guarantees. Overhead in compute, memory, and utility is substantial; parameter-efficient DP fine-tuning (LoRA + DP-SGD) is the common 2025 config...Phase 18: Ethics, Safety & Alignment / ~60 minutes lessonWatermarking — SynthID, Stable Signature, C2PAThree technologies structure 2026 AI-generated-content provenance. SynthID (Google DeepMind) — image watermarking launched August 2023, text+video May 2024 (Gemini + Veo), text open-sourced October 2024 via Responsible GenAI Toolkit, unifi...Phase 18: Ethics, Safety & Alignment / ~75 minutes lessonRegulatory Frameworks — EU, US, UK, KoreaFour primary regulatory regimes define the 2026 AI governance landscape. EU AI Act (in force 1 August 2024) — prohibited practices and AI literacy from 2 February 2025; GPAI obligations from 2 August 2025; full applicability and Article 50...Phase 18: Ethics, Safety & Alignment / ~75 minutes lessonEchoLeak and the Emergence of CVEs for AICVE-2025-32711 "EchoLeak" (CVSS 9.3) was the first publicly documented zero-click prompt injection in a production LLM system (Microsoft 365 Copilot). Discovered by Aim Labs (Aim Security), disclosed to MSRC, patched via server-side update...Phase 18: Ethics, Safety & Alignment / ~45 minutes lessonModel, System, and Dataset CardsThree documentation formats structure AI transparency. Model Cards (Mitchell et al. 2019) — nutrition labels for models: training data, quantitative disaggregated analyses, ethical considerations, caveats; only 0.3% of Hugging Face model c...Phase 18: Ethics, Safety & Alignment / ~60 minutes lessonData Provenance and Training-Data GovernanceEU AI Act requires machine-readable opt-out standards for GPAI by August 2025 (via EU Copyright Directive TDM exception). California AB 2013 (signed 2024) — Generative AI training-data transparency requires developers to publish a summary...Phase 18: Ethics, Safety & Alignment / ~60 minutes lessonAlignment Research Ecosystem — MATS, Redwood, Apollo, METRFive organisations define the 2026 non-lab alignment research layer. MATS (ML Alignment & Theory Scholars): 527+ researchers since late 2021, 180+ papers, 10K+ citations, h-index 47; summer 2024 cohort incorporated as 501(c)(3) with ~90 sc...Phase 18: Ethics, Safety & Alignment / ~45 minutes lessonModeration Systems — OpenAI, Perspective, Llama GuardProduction moderation systems operationalize the safety policies defined in Lessons 12-16. OpenAI Moderation API: omni-moderation-latest (2024) built on GPT-4o classifies text + images in one call; 42% better on multilingual test set than...Phase 18: Ethics, Safety & Alignment / ~60 minutes lessonDual-Use Risk — Cyber, Bio, Chem, Nuclear UpliftThe 2026 dual-use picture, domain by domain. Bio/chem: Lesson 17 covers WMDP; Anthropic's bioweapon-acquisition trial (2.53x uplift) and OpenAI's April 2025 Preparedness Framework v2 warning ("on the cusp of meaningfully helping novices cr...Phase 18: Ethics, Safety & Alignment / ~75 minutes lessonCapstone 01 — Terminal-Native Coding AgentBy 2026 the shape of a coding agent is settled. A TUI harness, a stateful plan, a sandboxed tool surface, a loop that plans, acts, observes, recovers. Claude Code, Cursor 3, and OpenCode all look the same from 50 feet. This capstone asks y...Phase 19: Capstone Projects / 35 hours lessonCapstone 02 — RAG over Codebase (Cross-Repo Semantic Search)Every serious engineering org in 2026 runs an internal code search that understands meaning, not just strings. Sourcegraph Amp, Cursor's codebase answers, Augment's enterprise graph, Aider's repomap, Pinterest's internal MCP — same shape....Phase 19: Capstone Projects / 30 hours lessonCapstone 03 — Real-Time Voice Assistant (ASR to LLM to TTS)A voice agent that feels right has end-to-end latency under 800ms, knows when you have stopped talking, handles barge-in, and can call a tool without stalling. Retell, Vapi, LiveKit Agents, and Pipecat all hit this bar in 2026. They do it...Phase 19: Capstone Projects / 30 hours lessonCapstone 04 — Multimodal Document QA (Vision-First PDF, Tables, Charts)The 2026 document-QA frontier moved away from OCR-then-text and toward vision-first late interaction. ColPali, ColQwen2.5, and ColQwen3-omni treat each PDF page as an image, embed it with multi-vector late interaction, and let the query at...Phase 19: Capstone Projects / 30 hours lessonCapstone 05 — Autonomous Research Agent (AI-Scientist Class)Sakana's AI-Scientist-v2 published full papers. Agent Laboratory ran the experiments. Allen AI shared traces. The 2026 shape is plan-execute-verify tree search over experiments, budgeted cost, sandboxed code execution, a vision-feedback La...Phase 19: Capstone Projects / 40 hours lessonCapstone 06 — DevOps Troubleshooting Agent for KubernetesAWS's DevOps Agent went GA, Resolve AI published its K8s playbooks, NeuBird demoed semantic monitoring, and Metoro tied AI SRE to per-service SLOs. The production shape is settled: an alert webhook fires, an agent reads telemetry, walks a...Phase 19: Capstone Projects / 30 hours lessonCapstone 07 — End-to-End Fine-Tuning Pipeline (Data to SFT to DPO to Serve)An 8B model trained on your own data, DPO-aligned on your own preferences, quantized, speculative-decoded, and served at measurable $/1M tokens. The 2026 open stack is Axolotl v0.8, TRL 0.15, Unsloth for iteration, GPTQ/AWQ/GGUF for quanti...Phase 19: Capstone Projects / 35 hours lessonCapstone 08 — Production RAG Chatbot for a Regulated VerticalHarvey, Glean, Mendable, and LlamaCloud all run the same production shape in 2026. Ingest with docling or Unstructured and ColPali for visuals. Hybrid search. Re-rank with bge-reranker-v2-gemma. Synthesize with Claude Sonnet 4.7 using prom...Phase 19: Capstone Projects / 30 hours lessonCapstone 09 — Code Migration Agent (Repo-Level Language / Runtime Upgrade)Amazon's MigrationBench (Java 8 to 17) and Google's App Engine Py2-to-Py3 migrator set the 2026 bar. Moderne's OpenRewrite does deterministic AST rewrites at scale. Grit targets the same problem with codemod-style DSL. The production patte...Phase 19: Capstone Projects / 30 hours lessonCapstone 10 — Multi-Agent Software Engineering TeamSWE-AF's factory architecture, MetaGPT's role-based prompting, AutoGen 0.4's typed actor graph, Cognition's Devin, and Factory's Droids all converged on the same 2026 shape: an architect plans, N coders work in parallel worktrees, a review...Phase 19: Capstone Projects / 40 hours lessonCapstone 11 — LLM Observability & Eval DashboardLangfuse went open-core. Arize Phoenix published the 2026 GenAI semconv mappings. Helicone and Braintrust both doubled down on per-user cost attribution. Traceloop's OpenLLMetry became the de-facto SDK instrumentation. The production shape...Phase 19: Capstone Projects / 25 hours lessonCapstone 12 — Video Understanding Pipeline (Scene, QA, Search)Twelve Labs productized Marengo + Pegasus. VideoDB shipped the CRUD-for-video API. AI2's Molmo 2 published open VLM checkpoints. Gemini long-context handles hours of video natively. TimeLens-100K defined temporal grounding at scale. The 20...Phase 19: Capstone Projects / 30 hours lessonCapstone 13 — MCP Server with Registry and GovernanceThe Model Context Protocol stopped being the future and became the default tool-use spec in 2026. Anthropic, OpenAI, Google, and every major IDE ship MCP clients. Pinterest published its internal ecosystem of MCP servers. The AAIF Registry...Phase 19: Capstone Projects / 25 hours lessonCapstone 14 — Speculative-Decoding Inference ServerEAGLE-3 in vLLM 0.7 ships 2.5-3x throughput on real traffic. P-EAGLE (AWS 2026) pushed parallel speculation even further. SGLang's SpecForge trained draft heads at scale. Red Hat's Speculators hub published aligned drafts for common open m...Phase 19: Capstone Projects / 30 hours lessonCapstone 15 — Constitutional Safety Harness + Red-Team RangeAnthropic's Constitutional Classifiers, Meta's Llama Guard 4, Google's ShieldGemma-2, NVIDIA's Nemotron 3 Content Safety, and X-Guard for multilingual coverage defined the 2026 safety-classifier stack. garak, PyRIT, NVIDIA Aegis, and promp...Phase 19: Capstone Projects / 25 hours lessonCapstone 16 — GitHub Issue-to-PR Autonomous AgentAWS Remote SWE Agents, Cursor Background Agents, OpenAI Codex cloud, and Google Jules all ship the same 2026 product shape: label an issue, get a PR. Run an agent in a cloud sandbox, verify tests pass, and post a review-ready PR with ratio...Phase 19: Capstone Projects / 30 hours lessonCapstone 17 — Personal AI Tutor (Adaptive, Multimodal, with Memory)Khanmigo (Khan Academy), Duolingo Max, Google LearnLM / Gemini for Education, Quizlet Q-Chat, and Synthesis Tutor all shipped adaptive multimodal tutoring at scale in 2026. The common shape is a Socratic policy (never just dump the answer)...Phase 19: Capstone Projects / 30 hours lessonAgent Harness Loop ContractThe harness is the agent. The model is a coprocessor. This lesson freezes the loop contract you can wire any model into.Phase 19: Capstone Projects / ~90 minutes lessonTool Registry with Schema ValidationA tool the agent cannot validate is a tool the agent cannot call. Build the registry and the schema checker before you build the tools.Phase 19: Capstone Projects / ~90 minutes lessonJSON-RPC 2.0 Over Newline-Delimited StdioThe transport between a model client and a tool server is JSON-RPC over stdio. Hand-rolling it once teaches you what every framing layer is paying for.Phase 19: Capstone Projects / ~90 minutes lessonFunction Call DispatcherThe dispatcher is where the harness pays for every promise the schema made. Timeouts, retries, dedupe, error mapping. All on one seam.Phase 19: Capstone Projects / ~90 minutes lessonPlan-Execute Control FlowA plan that cannot survive a failure is a script. A script that can replan is an agent. Build the replanner first.Phase 19: Capstone Projects / ~90 minutes lessonCapstone Lesson 25: Verification Gates and the Observation BudgetAn agent harness without a verification layer is a wish in a trenchcoat. This lesson builds the deterministic gate chain that decides whether a tool call is allowed to fire, how much of its output the agent is allowed to see, and when the...Phase 19: Capstone Projects / ~90 minutes lessonCapstone Lesson 26: Sandbox Runner with Denylist and Path JailThe verification gate decides whether a tool call should run. The sandbox decides what happens when it does. This lesson ships a subprocess runner that refuses dangerous executables, refuses dangerous argv shapes, jails every file path to...Phase 19: Capstone Projects / ~90 minutes lessonCapstone Lesson 27: Eval Harness with Fixture TasksA coding agent is only as good as the suite of tasks you measure it against. This lesson builds an evaluation harness that takes a folder of fixture tasks, runs each through a candidate agent, scores pass or fail through a deterministic ve...Phase 19: Capstone Projects / ~90 minutes lessonCapstone Lesson 28: Observability with OTel GenAI Spans and Prometheus MetricsAn agent harness without observability is a black box that costs money. This lesson hand-rolls a span builder that emits records compliant with the OpenTelemetry GenAI semantic conventions, writes them to a JSON-Lines file one span per lin...Phase 19: Capstone Projects / ~90 minutes lessonCapstone Lesson 29: End-to-End Coding Agent on the HarnessTrack A's payoff. This lesson stitches the gate chain, the sandbox, the eval harness, and the OTel spans into one working coding agent that fixes a real (small, fixture-scale) bug in a multi-file Python project. The agent is a deterministi...Phase 19: Capstone Projects / ~90 minutes lessonBPE Tokenizer From ScratchBytes in, ids out, ids back to the same bytes. Build the tokenizer that every modern text model still starts from.Phase 19: Capstone Projects / ~90 minutes lessonTokenized Dataset with Sliding WindowA pretraining run is a function from token ids to gradients. This lesson builds the conveyor that feeds the ids in.Phase 19: Capstone Projects / ~90 minutes lessonToken and Positional EmbeddingsIds are integers. The model wants vectors. Two lookup tables sit between them, and the choice of the positional one shapes what the model can learn.Phase 19: Capstone Projects / ~90 minutes lessonMulti-Head Self-AttentionOne linear projection, three views, H parallel heads, one mask. The attention block as the model actually uses it.Phase 19: Capstone Projects / ~90 minutes lessonTransformer Block from ScratchOne block is the unit of every modern decoder LLM. Layer norm, multi head attention, residual, MLP, residual. The pre-LN variant trains stably without warmup. The post-LN variant is what the original paper shipped. This lesson builds both,...Phase 19: Capstone Projects / ~90 minutes lessonGPT Model AssemblyTwelve blocks stacked, a token embedding, a learned position embedding, a final LayerNorm, and a tied language model head. That is the entire 124 million parameter GPT model. This lesson assembles those pieces into a working class, counts...Phase 19: Capstone Projects / ~90 minutes lessonTraining Loop and EvaluationA loop that does not measure is a loop that lies. This lesson builds the training loop that drives the GPT model: AdamW with weight decay split, a warmup plus cosine learning rate schedule, a calc_loss_batch helper, an evaluate_model pass...Phase 19: Capstone Projects / ~90 minutes lessonLoading Pretrained WeightsTraining a 124 million parameter model from scratch is a budget decision; loading a published checkpoint is a Tuesday. This lesson loads pretrained GPT-2 style weights from a safetensors file into the exact architecture from lesson 35, wal...Phase 19: Capstone Projects / ~90 minutes lessonCapstone Lesson 38: Classifier Fine-Tuning by Head SwapTrack B's first capstone. A pretrained language model is a stack of self-attention blocks ending in a token-prediction head. When you want spam vs ham, the head is wrong but the body is mostly right. This lesson rips the head off, glues a...Phase 19: Capstone Projects / ~90 minutes lessonCapstone Lesson 39: Instruction Tuning by Supervised Fine-TuningA pretrained base model can extend a sequence but cannot follow an instruction. Supervised fine-tuning is the smallest change that fixes this: feed the model paired examples of an instruction and a desired response, and train the body to p...Phase 19: Capstone Projects / ~90 minutes lessonCapstone Lesson 40: Direct Preference Optimization from ScratchReward models and PPO are the classical RLHF stack. DPO collapses that stack into a single supervised loss that fits a policy directly against preference pairs. This lesson derives the DPO loss from the reward-difference identity, ships a...Phase 19: Capstone Projects / ~90 minutes lessonCapstone Lesson 41: Full Evaluation PipelineTraining is the part you can monitor with loss curves. Evaluation is the part you have to design. This lesson builds a unified eval pipeline that takes any trained language model, runs four heterogeneous evals on it, aggregates the results...Phase 19: Capstone Projects / ~90 minutes lessonLarge Corpus DownloaderTraining a language model begins long before the first forward pass. The corpus has to land on disk, decompressed, deduplicated, and addressable, with the resume story already worked out before the network drops at 4 percent. This lesson b...Phase 19: Capstone Projects / ~90 minutes lessonHDF5 Tokenized CorpusThe downloaded corpus has to land in a layout the trainer can stream from at line speed. JSONL on disk does not survive 16 dataloader workers. HDF5 with a resizable, chunked integer dataset does. This lesson builds streaming tokenization i...Phase 19: Capstone Projects / ~90 minutes lessonCosine LR with Linear WarmupThe learning-rate schedule is the second most important decision after the loss function. AdamW with a cosine decay and a linear warmup is the modern default for language-model training because it lets the model see a small effective step...Phase 19: Capstone Projects / ~90 minutes lessonGradient Clipping and Mixed PrecisionThe optimizer and schedule from the previous lesson assume gradients are sane. They usually are not. A single bad batch can spike the gradient norm by three orders of magnitude. Mixed-precision training amplifies this by introducing FP16 o...Phase 19: Capstone Projects / ~90 minutes lessonGradient AccumulationTrain at an effective batch you cannot afford, one micro-batch at a time. Scale the loss, hold the optimizer step, and let the gradients pile up.Phase 19: Capstone Projects / ~90 minutes lessonCheckpoint Save and ResumeTrain interrupts kill runs; checkpoints let them continue. Save model, optimizer, scheduler, loss history, step counter, and RNG state, atomically, so a kill at any moment leaves a valid file on disk.Phase 19: Capstone Projects / ~90 minutes lessonDistributed Data Parallel and FSDP from ScratchMulti-rank training is two collectives and one rule. Broadcast the parameters at startup, average the gradients after backward, never let the ranks disagree about what step they are on.Phase 19: Capstone Projects / ~90 minutes lessonLanguage Model Evaluation HarnessA model that does well on a task you cannot define is a model that does well by accident. The harness is the task definition, the metric, the runner, and the leaderboard, in one short, swappable shape.Phase 19: Capstone Projects / ~90 minutes lessonHypothesis GeneratorA research agent that asks the same question twice is wasting tokens. The trick is forcing each draft to land somewhere new.Phase 19: Capstone Projects / ~90 minutes lessonLiterature RetrievalA hypothesis is cheap. Knowing whether someone already proved it is the expensive part. Build the retrieval layer that answers that question before the runner spins up a sandbox.Phase 19: Capstone Projects / ~90 minutes lessonExperiment RunnerThe loop is only as honest as its measurements. Build the runner that takes a spec, executes it in a sandboxed subprocess, and emits a json metrics blob the evaluator can trust.Phase 19: Capstone Projects / ~90 minutes lessonResult EvaluatorThe runner produced numbers. The evaluator decides whether those numbers are an improvement, a regression, or noise. Build the verdict path that turns metrics into a one line conclusion.Phase 19: Capstone Projects / ~90 minutes lessonPaper WriterA LaTeX skeleton is a contract between the researcher and the typesetter. If the contract is broken the document does not compile, and the failure is loud. Build the skeleton first, then fill it.Phase 19: Capstone Projects / ~90 minutes lessonCritic LoopA critic that returns "looks good" the first time is broken. A critic that always returns "needs work" is broken. The interesting critic is the one that converges, and you have to engineer convergence.Phase 19: Capstone Projects / ~90 minutes lessonIteration SchedulerA research loop without a scheduler is a queue with delusions. The scheduler is where the loop decides what to stop exploring, and that decision is the whole game.Phase 19: Capstone Projects / ~90 minutes lessonEnd-to-End Research DemoA demo is the place where every contract you wrote earlier has to compose. If any one of them leaks, the demo is the lesson that catches it.Phase 19: Capstone Projects / ~90 minutes lessonVision Encoder PatchesA vision model that reads pixels needs a tokenizer for pixels. Patch embedding is that tokenizer. Cut the image into a grid of squares, flatten each square, project it through one linear layer, then add a 2D position signal so the transfor...Phase 19: Capstone Projects / ~90 minutes lessonVision Transformer EncoderPatches alone do not see. A 12-layer pre-LN transformer with 12 attention heads turns the sequence of patch tokens into a sequence of contextual tokens, with the CLS token pooling whole-image features in its final hidden state. This lesson...Phase 19: Capstone Projects / ~90 minutes lessonProjection Layer for Modality AlignmentA vision encoder produces image tokens. A text decoder consumes text tokens. The two live in different vector spaces. A small two-layer MLP projects image tokens into the text embedding space, and a cosine alignment loss against a paired c...Phase 19: Capstone Projects / ~90 minutes lessonCross-Attention FusionThe projection layer aligns one image vector with one caption vector. A real vision-language decoder needs every text token to attend to every patch token, so the model can ground each word in a region. Cross-attention is how that groundin...Phase 19: Capstone Projects / ~90 minutes lessonVision-Language PretrainingThe encoder, projection, and decoder are wired. Now train them together. Two objectives drive learning: a contrastive image-text loss (InfoNCE) that pulls matching pairs together in the joint embedding space, and a language modeling loss t...Phase 19: Capstone Projects / ~90 minutes lessonMultimodal EvaluationTraining is half the loop. The other half is measurement. This lesson builds three evaluation surfaces from primitives: image-caption retrieval reported as R@1, R@5, R@10; visual question answering reported as exact match accuracy; and ima...Phase 19: Capstone Projects / ~90 minutes lessonChunking Strategies, ComparedChunking decides what your retriever can ever surface. Get the boundaries wrong and no embedding model, no reranker, no LLM can repair the damage downstream.Phase 19: Capstone Projects / ~90 minutes lessonHybrid Retrieval with BM25 and Dense EmbeddingsLexical and semantic retrieval fail on opposite query distributions. Hybrid retrieval with reciprocal rank fusion does not interpolate, it votes - and the vote wins on every query class.Phase 19: Capstone Projects / ~90 minutes lessonCross-Encoder RerankerA bi-encoder embeds query and document independently. A cross-encoder concatenates them and reads both at once. The cross-encoder is the smartest reader and the slowest. Used as a second stage on the bi-encoder's top-k, it pays for itself.Phase 19: Capstone Projects / ~90 minutes lessonQuery Rewriting: HyDE, Multi-Query, and DecompositionThe query the user types is not the query your retriever wants. Rewriting bridges the gap before retrieval, so the index sees something closer to what the answer looks like.Phase 19: Capstone Projects / ~90 minutes lessonRAG Evaluation: Precision, Recall, MRR, nDCG, Faithfulness, Answer RelevanceIf you cannot grade your retrieval and your answer at the same time, you cannot ship the system. The two are not the same metric and the same prompt fails on different axes.Phase 19: Capstone Projects / ~90 minutes lessonEnd-to-End RAG SystemSix lessons of components. One pipeline. One eval loop. One self-terminating demo. This is the system you ship.Phase 19: Capstone Projects / ~90 minutes lessonTask Spec FormatAn eval harness is only as good as the contract its tasks honour. Freeze the JSONL shape and the metric vocabulary before you write a single scoring function.Phase 19: Capstone Projects / ~90 min lessonClassical MetricsBLEU, ROUGE-L, F1, exact-match, accuracy. Five metrics that still account for most published LLM eval numbers. Implement each from first principles so you know what the number means.Phase 19: Capstone Projects / ~90 min lessonCode Exec MetricGenerated code is right when it passes the tests. The eval harness has to extract code, run it without crashing the host, and tally pass-rates honestly. This lesson builds that surface.Phase 19: Capstone Projects / ~90 min lessonPerplexity and CalibrationIf your model says 90 percent confident on a thousand answers and gets six hundred right, it is not well calibrated. Calibration is half of trustworthy eval. The other half is perplexity, which tells you whether the model thinks the held-o...Phase 19: Capstone Projects / ~90 min lessonLeaderboard AggregationPer-task scores are easy. Per-model rankings across heterogeneous tasks are harder. Statistical significance on a thousand-prediction leaderboard is the part everyone skips. This lesson does not skip it.Phase 19: Capstone Projects / ~90 min lessonEnd-to-End Eval RunnerFive lessons of plumbing, one lesson to glue them. The runner reads the task spec from lesson 70, calls a model through an adapter, scores with lessons 71 and 72, attaches the calibration report from lesson 73, and emits the leaderboard fr...Phase 19: Capstone Projects / ~90 min lessonCollective Ops From ScratchThe four collective operations that hold distributed training together are allreduce, broadcast, allgather, and reduce_scatter. Every other primitive a training framework offers is a wrapper around these. Build them once over a multiproces...Phase 19: Capstone Projects / ~90 min lessonData Parallel DDP From ScratchDistributedDataParallel is a hook on top of allreduce. Wrap a model, broadcast the initial parameters from rank 0 so every rank starts identical, install a backward hook on every parameter that issues an allreduce of the gradient, and the...Phase 19: Capstone Projects / ~90 min lessonZeRO Optimizer State ShardingAdam stores two moment estimates per parameter, both in float32. A 7B-parameter model carries 56 GB of optimiser state. ZeRO stage 1 shards that across N ranks; each rank owns 1/N of the optimiser. After the local step the updated paramete...Phase 19: Capstone Projects / ~90 min lessonPipeline Parallel and Bubble AnalysisTensor parallelism splits the matrix multiply across ranks. Pipeline parallelism splits the model across ranks, one stage per rank. Microbatches flow through the pipeline. The empty time at the start and end is the bubble; minimising it is...Phase 19: Capstone Projects / ~90 min lessonSharded Checkpoint and Atomic ResumeA 70B-parameter training job is paused by a node failure every few hours. The checkpoint format decides whether you lose 30 minutes or 30 hours. A sharded checkpoint writes every rank's shard in parallel and records ownership in a manifest...Phase 19: Capstone Projects / ~90 min lessonEnd-to-End Distributed TrainingLessons 76 through 80 each built one piece. This is the assembly: a tiny GPT trained across 4 simulated ranks with DDP for gradient sync, ZeRO-1 for optimiser-state sharding, and a sharded checkpoint at the halfway mark. The demo runs 20 s...Phase 19: Capstone Projects / ~90 min lessonCapstone 82 — Jailbreak TaxonomyA safety harness without a taxonomy is a coin flip. Name the attack before you defend it.Phase 19: Capstone Projects / ~90 min lessonCapstone 83 — Prompt Injection DetectorA detector is a function from prompt to confidence and category. Anything else is a vibe.Phase 19: Capstone Projects / ~90 min lessonCapstone 84 — Refusal EvaluationHelpfulness on benign prompts and refusal on harmful prompts are two metrics, not one. Measure both.Phase 19: Capstone Projects / ~90 min lessonCapstone 85 — Content Classifier IntegrationClassifiers on the output side answer a different question than rules on the input side. Both need a policy router.Phase 19: Capstone Projects / ~90 min lessonCapstone 86 — Constitutional Rules EngineA rule is a name, a predicate, and an explanation. Anything missing one of those three is a vibe, not a rule.Phase 19: Capstone Projects / ~90 min lessonCapstone 87 — End-to-End Safety GatePre-gen, during-gen, post-gen. Three checkpoints, one verdict, an audit trail per request.Phase 19: Capstone Projects / ~90 min topicBuildExplore 303 AI From Scratch lessons related to Build.303 lessons / starts near Setup & Tooling topicPythonExplore 230 AI From Scratch lessons related to Python.230 lessons / starts near Setup & Tooling topicLearnExplore 137 AI From Scratch lessons related to Learn.137 lessons / starts near Setup & Tooling topicCapstone ProjectsExplore 85 AI From Scratch lessons related to Capstone Projects.85 lessons / starts near Capstone Projects topicPython (stdlib)Explore 69 AI From Scratch lessons related to Python (stdlib).69 lessons / starts near LLMs from Scratch topicPython (stdlibExplore 46 AI From Scratch lessons related to Python (stdlib.46 lessons / starts near LLMs from Scratch topicAgent EngineeringExplore 42 AI From Scratch lessons related to Agent Engineering.42 lessons / starts near Agent Engineering topicCapstoneExplore 38 AI From Scratch lessons related to Capstone.38 lessons / starts near Computer Vision topicAgentExplore 35 AI From Scratch lessons related to Agent.35 lessons / starts near Reinforcement Learning topicLearn + BuildExplore 35 AI From Scratch lessons related to Learn + Build.35 lessons / starts near Computer Vision topicEthics, Safety & AlignmentExplore 30 AI From Scratch lessons related to Ethics, Safety & Alignment.30 lessons / starts near Ethics, Safety & Alignment topicNLP — Foundations to AdvancedExplore 29 AI From Scratch lessons related to NLP — Foundations to Advanced.29 lessons / starts near NLP — Foundations to Advanced topicComputer VisionExplore 28 AI From Scratch lessons related to Computer Vision.28 lessons / starts near Computer Vision topicInfrastructure & ProductionExplore 28 AI From Scratch lessons related to Infrastructure & Production.28 lessons / starts near Infrastructure & Production topicMulti-Agent & SwarmsExplore 25 AI From Scratch lessons related to Multi-Agent & Swarms.25 lessons / starts near Multi-Agent & Swarms topicMultimodal AIExplore 25 AI From Scratch lessons related to Multimodal AI.25 lessons / starts near Multimodal AI topicLLMs from ScratchExplore 24 AI From Scratch lessons related to LLMs from Scratch.24 lessons / starts near LLMs from Scratch topicTools & ProtocolsExplore 23 AI From Scratch lessons related to Tools & Protocols.23 lessons / starts near Tools & Protocols topicAutonomous SystemsExplore 22 AI From Scratch lessons related to Autonomous Systems.22 lessons / starts near Autonomous Systems topicMath FoundationsExplore 22 AI From Scratch lessons related to Math Foundations.22 lessons / starts near Math Foundations topicModelsExplore 21 AI From Scratch lessons related to Models.21 lessons / starts near Computer Vision topicLLMExplore 18 AI From Scratch lessons related to LLM.18 lessons / starts near Computer Vision topicML FundamentalsExplore 18 AI From Scratch lessons related to ML Fundamentals.18 lessons / starts near ML Fundamentals topicLLM EngineeringExplore 17 AI From Scratch lessons related to LLM Engineering.17 lessons / starts near LLM Engineering topicMultiExplore 17 AI From Scratch lessons related to Multi.17 lessons / starts near Deep Learning Core topicSpeech & AudioExplore 17 AI From Scratch lessons related to Speech & Audio.17 lessons / starts near Speech & Audio topicAgentsExplore 16 AI From Scratch lessons related to Agents.16 lessons / starts near NLP — Foundations to Advanced topicEvaluationExplore 16 AI From Scratch lessons related to Evaluation.16 lessons / starts near ML Fundamentals topicScratchExplore 16 AI From Scratch lessons related to Scratch.16 lessons / starts near Deep Learning Core topicTransformers Deep DiveExplore 16 AI From Scratch lessons related to Transformers Deep Dive.16 lessons / starts near Transformers Deep Dive topicVisionExplore 16 AI From Scratch lessons related to Vision.16 lessons / starts near Computer Vision topicGenerative AIExplore 15 AI From Scratch lessons related to Generative AI.15 lessons / starts near Generative AI topicMCPExplore 15 AI From Scratch lessons related to MCP.15 lessons / starts near LLM Engineering topicDeep Learning CoreExplore 13 AI From Scratch lessons related to Deep Learning Core.13 lessons / starts near Deep Learning Core topicAttentionExplore 12 AI From Scratch lessons related to Attention.12 lessons / starts near NLP — Foundations to Advanced topicAudioExplore 12 AI From Scratch lessons related to Audio.12 lessons / starts near Speech & Audio topicLanguageExplore 12 AI From Scratch lessons related to Language.12 lessons / starts near Computer Vision topicLearningExplore 12 AI From Scratch lessons related to Learning.12 lessons / starts near Math Foundations topicModelExplore 12 AI From Scratch lessons related to Model.12 lessons / starts near ML Fundamentals topicReinforcement LearningExplore 12 AI From Scratch lessons related to Reinforcement Learning.12 lessons / starts near Reinforcement Learning topicSetup & ToolingExplore 12 AI From Scratch lessons related to Setup & Tooling.12 lessons / starts near Setup & Tooling topicOptimizationExplore 11 AI From Scratch lessons related to Optimization.11 lessons / starts near Math Foundations topicProductionExplore 11 AI From Scratch lessons related to Production.11 lessons / starts near LLM Engineering topicRAGExplore 11 AI From Scratch lessons related to RAG.11 lessons / starts near NLP — Foundations to Advanced topicSelfExplore 11 AI From Scratch lessons related to Self.11 lessons / starts near Computer Vision topicTokenExplore 11 AI From Scratch lessons related to Token.11 lessons / starts near Setup & Tooling topicPipelineExplore 10 AI From Scratch lessons related to Pipeline.10 lessons / starts near ML Fundamentals topicToolExplore 10 AI From Scratch lessons related to Tool.10 lessons / starts near LLM Engineering topicTuningExplore 10 AI From Scratch lessons related to Tuning.10 lessons / starts near ML Fundamentals topicVideoExplore 10 AI From Scratch lessons related to Video.10 lessons / starts near Computer Vision topicGenerationExplore 9 AI From Scratch lessons related to Generation.9 lessons / starts near Computer Vision topicImageExplore 9 AI From Scratch lessons related to Image.9 lessons / starts near Computer Vision topicLessonExplore 9 AI From Scratch lessons related to Lesson.9 lessons / starts near Capstone Projects topicMultimodalExplore 9 AI From Scratch lessons related to Multimodal.9 lessons / starts near Multimodal AI topicContextExplore 8 AI From Scratch lessons related to Context.8 lessons / starts near NLP — Foundations to Advanced topicDecodingExplore 8 AI From Scratch lessons related to Decoding.8 lessons / starts near NLP — Foundations to Advanced topicDiffusionExplore 8 AI From Scratch lessons related to Diffusion.8 lessons / starts near Computer Vision topicEndExplore 8 AI From Scratch lessons related to End.8 lessons / starts near Tools & Protocols topicEvalExplore 8 AI From Scratch lessons related to Eval.8 lessons / starts near NLP — Foundations to Advanced topicInferenceExplore 8 AI From Scratch lessons related to Inference.8 lessons / starts near NLP — Foundations to Advanced topicMemoryExplore 8 AI From Scratch lessons related to Memory.8 lessons / starts near Computer Vision topicPromptExplore 8 AI From Scratch lessons related to Prompt.8 lessons / starts near LLM Engineering topicTemperatureExplore 8 AI From Scratch lessons related to Temperature.8 lessons / starts near Math Foundations topicTrainingExplore 8 AI From Scratch lessons related to Training.8 lessons / starts near ML Fundamentals topicAlignmentExplore 7 AI From Scratch lessons related to Alignment.7 lessons / starts near Autonomous Systems topicDataExplore 7 AI From Scratch lessons related to Data.7 lessons / starts near Setup & Tooling topicFineExplore 7 AI From Scratch lessons related to Fine.7 lessons / starts near Computer Vision topicHarnessExplore 7 AI From Scratch lessons related to Harness.7 lessons / starts near LLMs from Scratch topicOpenAIExplore 7 AI From Scratch lessons related to OpenAI.7 lessons / starts near Tools & Protocols topicRetrievalExplore 7 AI From Scratch lessons related to Retrieval.7 lessons / starts near Computer Vision topicTaskExplore 7 AI From Scratch lessons related to Task.7 lessons / starts near Tools & Protocols topicTextExplore 7 AI From Scratch lessons related to Text.7 lessons / starts near NLP — Foundations to Advanced topicTransformersExplore 7 AI From Scratch lessons related to Transformers.7 lessons / starts near Computer Vision topicAnthropicExplore 6 AI From Scratch lessons related to Anthropic.6 lessons / starts near Tools & Protocols topicArchitectureExplore 6 AI From Scratch lessons related to Architecture.6 lessons / starts near Computer Vision topicCheckpointExplore 6 AI From Scratch lessons related to Checkpoint.6 lessons / starts near LLM Engineering topicCrossExplore 6 AI From Scratch lessons related to Cross.6 lessons / starts near Multimodal AI topicCross-attentionExplore 6 AI From Scratch lessons related to Cross-attention.6 lessons / starts near Computer Vision topicEncoderExplore 6 AI From Scratch lessons related to Encoder.6 lessons / starts near NLP — Foundations to Advanced topicEngineeringExplore 6 AI From Scratch lessons related to Engineering.6 lessons / starts near ML Fundamentals topicGradientExplore 6 AI From Scratch lessons related to Gradient.6 lessons / starts near Math Foundations topicLearning rateExplore 6 AI From Scratch lessons related to Learning rate.6 lessons / starts near Math Foundations topicLLMsExplore 6 AI From Scratch lessons related to LLMs.6 lessons / starts near Multi-Agent & Swarms topicMachineExplore 6 AI From Scratch lessons related to Machine.6 lessons / starts near Math Foundations topicModelingExplore 6 AI From Scratch lessons related to Modeling.6 lessons / starts near Computer Vision topicOpenExplore 6 AI From Scratch lessons related to Open.6 lessons / starts near Computer Vision topicPolicyExplore 6 AI From Scratch lessons related to Policy.6 lessons / starts near Reinforcement Learning topicPython (with numpy)Explore 6 AI From Scratch lessons related to Python (with numpy).6 lessons / starts near LLMs from Scratch topicSafetyExplore 6 AI From Scratch lessons related to Safety.6 lessons / starts near LLM Engineering topicSearchExplore 6 AI From Scratch lessons related to Search.6 lessons / starts near NLP — Foundations to Advanced topicServerExplore 6 AI From Scratch lessons related to Server.6 lessons / starts near LLM Engineering topicStateExplore 6 AI From Scratch lessons related to State.6 lessons / starts near NLP — Foundations to Advanced topicStreamingExplore 6 AI From Scratch lessons related to Streaming.6 lessons / starts near Setup & Tooling topicTimeExplore 6 AI From Scratch lessons related to Time.6 lessons / starts near ML Fundamentals topicTransformerExplore 6 AI From Scratch lessons related to Transformer.6 lessons / starts near Transformers Deep Dive topicTypeScriptExplore 6 AI From Scratch lessons related to TypeScript.6 lessons / starts near Setup & Tooling topicUseExplore 6 AI From Scratch lessons related to Use.6 lessons / starts near LLM Engineering topicVoiceExplore 6 AI From Scratch lessons related to Voice.6 lessons / starts near Speech & Audio topicAutonomousExplore 5 AI From Scratch lessons related to Autonomous.5 lessons / starts near Autonomous Systems topicAutoregressiveExplore 5 AI From Scratch lessons related to Autoregressive.5 lessons / starts near Transformers Deep Dive topicBi-encoderExplore 5 AI From Scratch lessons related to Bi-encoder.5 lessons / starts near NLP — Foundations to Advanced topicBuildingExplore 5 AI From Scratch lessons related to Building.5 lessons / starts near LLMs from Scratch topicChunkingExplore 5 AI From Scratch lessons related to Chunking.5 lessons / starts near NLP — Foundations to Advanced topicConstitutionalExplore 5 AI From Scratch lessons related to Constitutional.5 lessons / starts near LLMs from Scratch topicCosine similarityExplore 5 AI From Scratch lessons related to Cosine similarity.5 lessons / starts near Math Foundations topicCross-encoderExplore 5 AI From Scratch lessons related to Cross-encoder.5 lessons / starts near NLP — Foundations to Advanced topicDebateExplore 5 AI From Scratch lessons related to Debate.5 lessons / starts near Agent Engineering topicDetectionExplore 5 AI From Scratch lessons related to Detection.5 lessons / starts near ML Fundamentals topicEAGLEExplore 5 AI From Scratch lessons related to EAGLE.5 lessons / starts near Transformers Deep Dive topicEmbeddingsExplore 5 AI From Scratch lessons related to Embeddings.5 lessons / starts near NLP — Foundations to Advanced topicFlowExplore 5 AI From Scratch lessons related to Flow.5 lessons / starts near Computer Vision topicFunction callingExplore 5 AI From Scratch lessons related to Function calling.5 lessons / starts near LLM Engineering topicHandoffExplore 5 AI From Scratch lessons related to Handoff.5 lessons / starts near Agent Engineering topicLongExplore 5 AI From Scratch lessons related to Long.5 lessons / starts near NLP — Foundations to Advanced topicLoopExplore 5 AI From Scratch lessons related to Loop.5 lessons / starts near Agent Engineering topicOverfittingExplore 5 AI From Scratch lessons related to Overfitting.5 lessons / starts near ML Fundamentals topicParallelExplore 5 AI From Scratch lessons related to Parallel.5 lessons / starts near Tools & Protocols topicPerplexityExplore 5 AI From Scratch lessons related to Perplexity.5 lessons / starts near Math Foundations topicPrompt cachingExplore 5 AI From Scratch lessons related to Prompt caching.5 lessons / starts near LLM Engineering topicRealExplore 5 AI From Scratch lessons related to Real.5 lessons / starts near Computer Vision trackPrompt EngineeringMove from prompting by luck to prompting by design. Structured techniques for reliable, repeatable results across any model.Intermediate / 4 lessons trackBuilding With LLMsGo from notebook to product. Learn the engineering patterns for shipping reliable AI features: RAG, function calling, streaming, and monitoring.Intermediate / 4 lessons trackAI Product DesignDesign AI experiences people trust. Patterns for handling uncertainty, latency, errors, and the new interaction models AI unlocks.Advanced / 4 lessons moduleCore techniquesFrame, constrain, and demonstrate.Prompt Engineering moduleAdvanced patternsChain prompts and defend them.Prompt Engineering moduleRetrievalRAG, chunks, and citations.Building With LLMs moduleTools and agentsFunction calls and tool design.Building With LLMs moduleProductionStreaming, cost, and observability.Building With LLMs moduleDesigning for trustUncertainty, review, and feedback.AI Product Design moduleLaunching AI featuresRelease quality and monitoring.AI Product Design lessonRole, Task, and Context FramingFrame prompts so the model knows the job, the audience, the input, and the expected shape of the answer.Prompt Engineering / Core techniques lessonExamples and ConstraintsUse few-shot examples and constraints to make outputs repeatable without overfitting the prompt to one case.Prompt Engineering / Core techniques lessonPrompt ChainingBreak large tasks into smaller calls so each step can be inspected, reused, and tested.Prompt Engineering / Advanced patterns lessonPrompt Injection DefenseLearn practical guardrails for untrusted input, tool calls, and instructions hidden inside retrieved content.Prompt Engineering / Advanced patterns lessonRAG BlueprintDesign a retrieval-augmented generation system with ingestion, chunking, retrieval, synthesis, and evaluation.Building With LLMs / Retrieval lessonFunction CallingTeach models to request structured tool calls instead of pretending to complete actions in natural language.Building With LLMs / Tools and agents lessonStreaming UXDesign interfaces that feel fast while making partial, uncertain, or long-running AI output understandable.Building With LLMs / Production lessonCost and ObservabilityTrack the signals that make AI features operational: cost, latency, quality, errors, and user feedback.Building With LLMs / Production lessonDesigning for UncertaintyCreate interfaces that acknowledge AI uncertainty without making users do all the verification work.AI Product Design / Designing for trust lessonHuman in the LoopDecide when AI should act autonomously, when it should suggest, and when a person must approve.AI Product Design / Designing for trust lessonFeedback LoopsTurn user feedback into a system that improves prompts, retrieval, evaluations, and product decisions.AI Product Design / Designing for trust lessonAI Feature Launch ChecklistPrepare an AI product slice for release with quality gates, fallback behavior, monitoring, and communication.AI Product Design / Launching AI features projectBuild a reusable prompt libraryCreate versioned prompt templates for repeated workflows.Prompt Engineering / Intermediate projectShip a prompt-powered toolTurn one prompt chain into a small product workflow.Prompt Engineering / Intermediate projectBuild a RAG chatbotDesign a source-grounded chatbot with citations and quality checks.Building With LLMs / Intermediate projectShip an agent that uses toolsDesign a safe tool-using assistant with validation and fallback behavior.Building With LLMs / Advanced projectRun a trust auditAudit an AI workflow for uncertainty, evidence, user control, and recovery paths.AI Product Design / Advanced projectWrite an AI feature launch planPrepare a model-powered product slice for launch with gates, fallbacks, and monitoring.AI Product Design / Advanced toolQdrantVector database for semantic search, recommendations, and RAG retrieval.Research and Data / Open Source toolLangChainFramework for composing LLM applications, chains, retrieval, and tool calls.Agents and Automation / Open Source toolOllamaRun local language models for private experiments and offline AI workflows.Coding and Dev / Free toolPromptlyVersion, test, and share prompts across repeatable team workflows.Productivity / Freemium toolAgentKitComposable patterns for building tool-using agents with validation and monitoring.Agents and Automation / Open Source toolDataLensAsk structured questions of datasets and produce plain-language analysis.Research and Data / Paid gameClaude Certified Architect ChallengeClear 10 AI architecture floors covering Claude agents, MCP tools, Claude Code workflows, prompt reliability, and context management.Certification Challenge / 20 min gamePrompt GolfReach the target output in the fewest tokens possible.Challenge / 5 min gameHallucination HuntSpot the fabricated fact in AI-generated answers.Quiz / 3 min gameEmbedding MatchGroup concepts by semantic similarity before the clock runs out.Puzzle / 7 min gameAgent ArenaDesign an agent and watch it tackle live tasks against constraints.Sandbox / 20 min examAI Engineer PrepA progressive interview path around LLMs, RAG, agents, evaluations, and production readiness.AI / LLM Engineer / 180 questions examML Engineer PrepFundamentals through ML system design, with practical checks for model quality and deployment tradeoffs.Machine Learning Engineer / 240 questions examData Scientist PrepStatistics, experimentation, modeling, and narrative thinking for data science loops.Data Scientist / 210 questions examAI Product Manager PrepStrategy, metrics, user trust, prioritization, and responsible AI launch decisions.AI Product Manager / 120 questions newsletterThe year agents went mainstreamTool-using agents moved from demos to production. We break down the patterns that actually shipped.Issue 47 / Agents newsletterSmall models, big impactWhy the most interesting work this month happened on models you can run on a laptop.Issue 46 / Models newsletterEvaluations are the new testsA practical guide to building eval suites your team will actually maintain.Issue 45 / Engineering newsletterDesigning for the pauseWhat great products do during the second an AI is thinking.Issue 44 / Design communityCommunityContribute lessons, product polish, tools, exams, and open-source improvements.Open source