Octo aims to advance the development of versatile robots capable of performing various manual tasks, leveraging large datasets and cutting-edge neural network technology.
Researchers at the University of California, Berkeley (UC Berkeley), Stanford University, and Carnegie Mellon University (CMU) have introduced Octo, an open-source generalist model for robotic manipulation. Octo is designed to enable various robotic systems to manipulate a wide range of objects effectively. This model, detailed in a pre-published paper on arXiv, aims to advance the development of robots capable of performing diverse manual tasks.
The team’s objectives were twofold: to develop a versatile generalist robotics model applicable to various robots and to create open-source code enabling other researchers to build similar models in the future.In the technology research and development community, highly performing computational tools that can be applied across multiple systems are often referred to as foundational models. An example is ChatGPT, which equips various agents and systems with natural language processing (NLP) capabilities.
Octo is based on transformers, the same type of neural networks used in ChatGPT. The model was trained on the largest dataset of robotic manipulation trajectories compiled to date, the Open X-Embodiment dataset. Octo can process a diverse range of sensory inputs, including different types of images, robot joint readings, language instructions, and goal-related images.
“Octo is what we call a ‘generalist’ robot model, a neural network that can control many different types of robots and make them fulfill requests like ‘pick up the spoon,’ ‘close the drawer,’ ‘wipe the table’ etc.,” the researchers explained. “Being a generalist and working on many robots is key, because if you look at research labs around the world, many of them use different robots, so the only way to ensure Octo can be used by many researchers is by supporting a wide range of robots.”
“Much of the current progress in AI is driven by large datasets and large models,” said researchers Dibya Ghosh, Homer Walke, Karl Pertsch, Kevin Black, and Oier Mees to Tech Xplore. “In the robotics community, we recently assembled the Open X-Embodiment dataset, a big manipulation dataset that pools data from many research institutions. While this new dataset is a really exciting resource, at the time there weren’t many models that could make use of it yet.”
“We want to build similar foundation models, but for robot control, or in other words, models that can control many robots and make them solve many different tasks,” the researchers stated. “Octo is a first step towards that goal. Its training looks very similar to models like ChatGPT: we curate a large and diverse dataset, in our case robot data instead of text, and train a large model to predict the next action the robot should execute given the current robot state and a task instruction.”