ManipulaTHOR: A Framework for Visual Object Manipulation
@article{Ehsani2021ManipulaTHORAF, title={ManipulaTHOR: A Framework for Visual Object Manipulation}, author={Kiana Ehsani and Winson Han and Alvaro Herrasti and Eli VanderBilt and Luca Weihs and Eric Kolve and Aniruddha Kembhavi and Roozbeh Mottaghi}, journal={2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, year={2021}, pages={4495-4504} }
The domain of Embodied AI has recently witnessed substantial progress, particularly in navigating agents within their environments. These early successes have laid the building blocks for the community to tackle tasks that require agents to actively interact with objects in their environment. Object manipulation is an established research domain within the robotics community and poses several challenges including manipulator motion, grasping and long-horizon planning, particularly when dealing…
Figures and Tables from this paper
17 Citations
Object Manipulation via Visual Target Localization
- Computer ScienceArXiv
- 2022
This work proposes Manipulation via Visual Object Location Estimation (m-VOLE), an approach that explores the environment in search for target objects, computes their 3D coordinates once they are located, and then continues to estimate their3D locations even when the objects are not visible, thus robustly aiding the task of manipulating these objects throughout the episode.
Towards Disturbance-Free Visual Mobile Manipulation
- Computer ScienceArXiv
- 2021
This work develops a new disturbance-avoidance methodology at the heart of which is the auxiliary task of disturbance prediction, which greatly enhances sample efficiency and final performance by knowledge distillation of disturbance into the agent.
Core Challenges in Embodied Vision-Language Planning
- Computer ScienceArXiv
- 2021
A taxonomy is proposed to unify Embodied Vision-Language Planning tasks, a family of prominent embodied navigation and manipulation problems that jointly use computer vision and natural language, and an in-depth analysis and comparison of the new and current algorithmic approaches, metrics, simulated environments, as well as the datasets used for EVLP tasks are presented.
ManiSkill: Generalizable Manipulation Skill Benchmark with Large-Scale Demonstrations
- Computer ScienceNeurIPS Datasets and Benchmarks
- 2021
The proposed SAPIEN Manipulation Skill Benchmark (ManiSkill) is a benchmark to benchmark manipulation skills over diverse objects in a full-physics simulator, and provides baselines using 3D deep learning and LfD algorithms.
RFUniverse: A Physics-based Action-centric Interactive Environment for Everyday Household Tasks
- Computer ScienceArXiv
- 2022
A novel physics-based actioncentric environment, RFUniverse, is proposed for robot learning of everyday household tasks, which supports interactions among 87 atomic actions and 8 basic object types in a visually and physically plausible way.
ASC me to Do Anything: Multi-task Training for Embodied AI
- Computer Science
- 2022
Atomic Skill Completion (ASC) is proposed, an approach for multi-task training for Embodied AI, where a set of atomic skills shared across multiple tasks are composed together to perform the tasks.
Habitat 2.0: Training Home Assistants to Rearrange their Habitat
- Computer ScienceNeurIPS
- 2021
Habitat 2.0 is introduced, a simulation platform for training virtual robots in interactive 3D environments and complex physics-enabled scenarios, and it is found that flat RL policies struggle on HAB compared to hierarchical ones, and a hierarchy with independent skills suffers from ‘hand-off problems’.
iGibson 2.0: Object-Centric Simulation for Robot Learning of Everyday Household Tasks
- Computer ScienceCoRL
- 2021
The new capabilities of iGibson 2.0 are evaluated to enable robot learning of novel tasks, in the hope of demonstrating the potential of this new simulator to support new directions of research in embodied AI.
IFOR: Iterative Flow Minimization for Robotic Object Rearrangement
- Computer ScienceArXiv
- 2022
This work proposes IFOR, Iterative Flow Minimization for Robotic Object Rearrangement, an end-to-end method for the challenging problem of object rearrangement for unknown objects given an RGBD image of the original and final scenes and shows that this method applies to cluttered scenes, and in the real world, while training only on synthetic data.
Continuous Scene Representations for Embodied AI
- Computer ScienceArXiv
- 2022
This work proposes Continuous Scene Representations (CSR), a scene representation constructed by an embodied agent navigating within a space, where objects and their relationships are modeled by continuous valued embeddings, to embed pair-wise relationships between objects in a latent space.
References
SHOWING 1-10 OF 43 REFERENCES
AllenAct: A Framework for Embodied AI Research
- Computer ScienceArXiv
- 2020
AllenAct is introduced, a modular and flexible learning framework designed with a focus on the unique requirements of Embodied AI research that provides first-class support for a growing collection of embodied environments, tasks and algorithms.
Combined task and motion planning through an extensible planner-independent interface layer
- Computer Science2014 IEEE International Conference on Robotics and Automation (ICRA)
- 2014
This work proposes a new approach that uses off-the-shelf task planners and motion planners and makes no assumptions about their implementation and uses a novel representational abstraction that requires only that failures in computing a motion plan for a high-level action be identifiable and expressible in the form of logical predicates at the task level.
HRL4IN: Hierarchical Reinforcement Learning for Interactive Navigation with Mobile Manipulators
- Computer ScienceCoRL
- 2019
HRL4IN is proposed, a novel Hierarchical RL architecture for Interactive Navigation tasks that exploits the exploration benefits of HRL over flat RL for long-horizon tasks thanks to temporally extended commitments towards subgoals and significantly outperforms its baselines in terms of task performance and energy efficiency.
SAPIEN: A SimulAted Part-Based Interactive ENvironment
- Computer Science2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2020
SAPIEN is a realistic and physics-rich simulated environment that hosts a large-scale set of articulated objects that enables various robotic vision and interaction tasks that require detailed part-level understanding and hopes it will open research directions yet to be explored.
Cognitive Mapping and Planning for Visual Navigation
- Computer ScienceInternational Journal of Computer Vision
- 2019
The Cognitive Mapper and Planner is based on a unified joint architecture for mapping and planning, such that the mapping is driven by the needs of the task, and a spatial memory with the ability to plan given an incomplete set of observations about the world.
Target-driven visual navigation in indoor scenes using deep reinforcement learning
- Computer Science2017 IEEE International Conference on Robotics and Automation (ICRA)
- 2017
This paper proposes an actor-critic model whose policy is a function of the goal as well as the current state, which allows better generalization and proposes the AI2-THOR framework, which provides an environment with high-quality 3D scenes and a physics engine.
RLBench: The Robot Learning Benchmark & Learning Environment
- Computer ScienceIEEE Robotics and Automation Letters
- 2020
This large-scale benchmark aims to accelerate progress in a number of vision-guided manipulation research areas, including: reinforcement learning, imitation learning, multi-task learning, geometric computer vision, and in particular, few-shot learning.
ROBOTURK: A Crowdsourcing Platform for Robotic Skill Learning through Imitation
- Computer ScienceCoRL
- 2018
It is shown that the data obtained through RoboTurk enables policy learning on multi-step manipulation tasks with sparse rewards and that using larger quantities of demonstrations during policy learning provides benefits in terms of both learning consistency and final performance.
Habitat: A Platform for Embodied AI Research
- Computer Science2019 IEEE/CVF International Conference on Computer Vision (ICCV)
- 2019
The comparison between learning and SLAM approaches from two recent works are revisited and evidence is found -- that learning outperforms SLAM if scaled to an order of magnitude more experience than previous investigations, and the first cross-dataset generalization experiments are conducted.
Data-Driven Grasp Synthesis—A Survey
- Computer ScienceIEEE Transactions on Robotics
- 2014
A review of the work on data-driven grasp synthesis and the methodologies for sampling and ranking candidate grasps and an overview of the different methodologies are provided, which draw a parallel to the classical approaches that rely on analytic formulations.