Interactive Language: Talking to Robots in Real Time
@article{Lynch2022InteractiveLT, title={Interactive Language: Talking to Robots in Real Time}, author={Corey Lynch and Ayzaan Wahid and Jonathan Tompson and Tianli Ding and James Betker and Robert K. Baruch and Travis Armstrong and Peter R. Florence}, journal={ArXiv}, year={2022}, volume={abs/2210.06407} }
—We present a framework for building interactive, real- time, natural language-instructable robots in the real world, and we open source related assets (dataset, environment, benchmark, and policies). Trained with behavioral cloning on a dataset of hundreds of thousands of language-annotated trajectories, a produced policy can proficiently execute an order of magnitude more commands than previous works: specifically we estimate a 93.5% success rate on a set of 87,000 unique natural language…
Figures and Tables from this paper
6 Citations
Robotic Skill Acquisition via Instruction Augmentation with Vision-Language Models
- Computer Science
- 2022
DIAL is introduced, which utilizes semi-supervised language labels leveraging the semantic understanding of CLIP to propagate knowledge onto large datasets of unlabelled demonstration data and then train language-conditioned policies on the augmented datasets, enabling cheaper acquisition of useful language descriptions compared to expensive human labels.
Skill Acquisition by Instruction Augmentation on Offline Datasets
- Computer Science
- 2022
DIAL is applied to a challenging real-world robotic manipulation domain, enabling imitation learning policies to acquire new capabilities and generalize to 60 novel instructions unseen in the original dataset.
PIRLNav: Pretraining with Imitation and RL Finetuning for ObjectNav
- Computer ScienceArXiv
- 2023
This work presents a two-stage learning scheme for IL pretraining on human demonstrations followed by RL-finetuning, and investigates whether human demonstrations can be replaced with ‘free’ sources of demonstrations, e.g .
StructDiffusion: Object-Centric Diffusion for Semantic Rearrangement of Novel Objects
- Computer ScienceArXiv
- 2022
This work proposes StructDiffusion, which combines a diffusion model and an object-centric transformer to construct structures out of a single RGB-D image based on high-level language goals, such as “set the table”, and shows how diffusion models can be used for complex multi-step 3D planning tasks.
Calibrated Interpretation: Confidence Estimation in Semantic Parsing
- Computer ScienceArXiv
- 2022
This work examines the calibration characteristics of six models across three model families on two common English semantic parsing datasets, finding that many models are reasonably well-calibrated and that there is a trade-off between calibration and performance.
Visuomotor Control in Multi-Object Scenes Using Object-Aware Representations
- Computer ScienceArXiv
- 2022
This paper demonstrates that the resulting representations are insufficient for general purpose robotics tasks as they fail to capture the complexity of scenes with many components and outperforms state-of-the-art object agnostic techniques as well as methods trained on raw RGB images.
References
SHOWING 1-10 OF 61 REFERENCES
Towards Real-Time Natural Language Corrections for Assistive Robots
- Computer Science
- 2010
This paper proposes a generalizable natural language interface that allows users to provide corrective instructions to an assistive robotic manipulator in real-time and develops a language model using data collected from Amazon Mechanical Turk in hopes of capturing a comprehensive selection of terminology that real people use to describe desired corrections.
CALVIN: A Benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks
- Computer ScienceIEEE Robotics and Automation Letters
- 2022
CALVIN (Composing Actions from Language and Vision), an open-source simulated benchmark to learn long-horizon language-conditioned tasks, is presented, suggesting that there is significant room for developing innovative agents that learn to relate human language to their world models with this benchmark.
Correcting Robot Plans with Natural Language Feedback
- Computer ScienceRobotics: Science and Systems XVIII
- 2022
This paper describes how to map from natural language sentences to transformations of cost functions and shows that these transformations enable users to correct goals, update robot motions to accommodate additional user preferences, and recover from planning errors.
Do As I Can, Not As I Say: Grounding Language in Robotic Affordances
- BiologyArXiv
- 2022
It is shown how low-level skills can be combined with large language models so that the language model provides high-level knowledge about the procedures for performing complex and temporally extended instructions, while value functions associated with these skills provide the grounding necessary to connect this knowledge to a particular physical environment.
Language Conditioned Imitation Learning Over Unstructured Data
- Computer ScienceRobotics: Science and Systems
- 2021
This work presents a method for incorporating free-form natural language conditioning into imitation learning, and proposes combining text conditioned policies with large pretrained neural language models to scale up the number of instructions an agent can follow.
Learning to Parse Natural Language Commands to a Robot Control System
- Computer ScienceISER
- 2012
This work discusses the problem of parsing natural language commands to actions and control structures that can be readily implemented in a robot execution system, and learns a parser based on example pairs of English commands and corresponding control language expressions.
Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents
- Computer ScienceICML
- 2022
This paper investigates the possibility of grounding high-level tasks, expressed in natural language, to a chosen set of actionable steps and proposes a procedure that conditions on existing demonstrations and semantically translates the plans to admissible actions.
Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation
- Computer ScienceAAAI
- 2011
A new model for understanding natural language commands given to autonomous systems that perform navigation and mobile manipulation in semi-structured environments that dynamically instantiates a probabilistic graphical model for a particular natural language command according to the command's hierarchical and compositional semantic structure.
Inner Monologue: Embodied Reasoning through Planning with Language Models
- Computer ScienceArXiv
- 2022
This work proposes that by leveraging environment feedback, LLMs are able to form an inner monologue that allows them to more richly process and plan in robotic control scenarios, and finds that closed-loop language feedback significantly improves high-level instruction completion on three domains.
Learning Language-Conditioned Robot Behavior from Offline Data and Crowd-Sourced Annotation
- Computer ScienceCoRL
- 2021
This work studies the problem of learning a range of vision-based manipulation tasks from a large offline dataset of robot interaction, and outperforms both goal-image specifications and language conditioned imitation techniques by more than 25%, and is able to perform visuomotor tasks from natural language, such as “open the right drawer” and “move the stapler” on a Franka Emika Panda robot.