Robotic Skill Acquisition via Instruction Augmentation with Vision-Language Models
@article{Xiao2022RoboticSA, title={Robotic Skill Acquisition via Instruction Augmentation with Vision-Language Models}, author={Ted Xiao and Harris Chan and Pierre Sermanet and Ayzaan Wahid and Anthony Brohan and Karol Hausman and Sergey Levine and Jonathan Tompson}, journal={ArXiv}, year={2022}, volume={abs/2211.11736} }
In recent years, much progress has been made in learning robotic manipulation policies that follow natural language instructions. Such methods typically learn from corpora of robot-language data that was either collected with specific tasks in mind or expensively re-labelled by humans with rich language descriptions in hindsight. Recently, large-scale pretrained vision-language models (VLMs) like CLIP [38] or ViLD [21] have been applied to robotics for learning representations and scene…
Figures and Tables from this paper
One Citation
Distilling Internet-Scale Vision-Language Models into Embodied Agents
- Computer Science
- 2023
This work outlines a new and effective way to use internet-scale VLMs, repur-posing the generic language grounding acquired by such models to teach task-relevant groundings to embodied agents.