In this tutorial and code, I share and walk through the process of building a mini generalist robotics policy. Starting from Karpthy’s mini GPT code, we create a vision transformer, and from the vision transformer, we create a generalist robotics policy, which is a type of multi-modal transformer model. The final model is based on the paper “Octo: An Open-Source Generalist Robot Policy,” which is a great paper that includes open-source code.
Tutorial Video
Details
The code is mostly in Python. Is include a python notebook to recreate the steps in the video and a final small script that is just the final version for the model.