The cutting-edge of robotics and AI is being pushed even further with the introduction of an open-knowledge-based framework called OK-Robot, which combines pre-trained machine learning models and robotic systems to perform tasks in unfamiliar environments.
OK-Robot’s framework seamlessly integrates vision-language models with robotics primitives, as well as newer models developed by the VLM and robotics community, to execute pick-and-drop operations without prior training.
OK-Robot has marked an important step forward by showcasing the potential of contemporary open-vocabulary vision-language models to identify objects and navigate to them in a zero-shot approach, proving that pre-trained models can be effectively combined to carry out zero-shot tasks.