Loading lesson page...
AI From Scratch/Lesson 21/~180 minutes
Embodied VLAs: RT-2, OpenVLA, π0, GR00T
The first time a model read a recipe off a website and executed it in a kitchen robot was RT-2 (Google DeepMind, July 2023). RT-2 discretized actions as text tokens, co-fine-tuned a VLM on web data plus robot-action data, and proved that w...
LearnPython (stdlibaction tokenizer + VLA inference skeleton)