ManipulaTHOR: A Framework for Visual Object Manipulation

@article{Ehsani2021ManipulaTHORAF,
  title={ManipulaTHOR: A Framework for Visual Object Manipulation},
  author={Kiana Ehsani and Winson Han and Alvaro Herrasti and Eli VanderBilt and Luca Weihs and Eric Kolve and Aniruddha Kembhavi and Roozbeh Mottaghi},
  journal={2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2021},
  pages={4495-4504}
}
The domain of Embodied AI has recently witnessed substantial progress, particularly in navigating agents within their environments. These early successes have laid the building blocks for the community to tackle tasks that require agents to actively interact with objects in their environment. Object manipulation is an established research domain within the robotics community and poses several challenges including manipulator motion, grasping and long-horizon planning, particularly when dealing… 
Object Manipulation via Visual Target Localization
TLDR
This work proposes Manipulation via Visual Object Location Estimation (m-VOLE), an approach that explores the environment in search for target objects, computes their 3D coordinates once they are located, and then continues to estimate their3D locations even when the objects are not visible, thus robustly aiding the task of manipulating these objects throughout the episode.
Towards Disturbance-Free Visual Mobile Manipulation
TLDR
This work develops a new disturbance-avoidance methodology at the heart of which is the auxiliary task of disturbance prediction, which greatly enhances sample efficiency and final performance by knowledge distillation of disturbance into the agent.
Core Challenges in Embodied Vision-Language Planning
TLDR
A taxonomy is proposed to unify Embodied Vision-Language Planning tasks, a family of prominent embodied navigation and manipulation problems that jointly use computer vision and natural language, and an in-depth analysis and comparison of the new and current algorithmic approaches, metrics, simulated environments, as well as the datasets used for EVLP tasks are presented.
ManiSkill: Generalizable Manipulation Skill Benchmark with Large-Scale Demonstrations
TLDR
The proposed SAPIEN Manipulation Skill Benchmark (ManiSkill) is a benchmark to benchmark manipulation skills over diverse objects in a full-physics simulator, and provides baselines using 3D deep learning and LfD algorithms.
RFUniverse: A Physics-based Action-centric Interactive Environment for Everyday Household Tasks
TLDR
A novel physics-based actioncentric environment, RFUniverse, is proposed for robot learning of everyday household tasks, which supports interactions among 87 atomic actions and 8 basic object types in a visually and physically plausible way.
ASC me to Do Anything: Multi-task Training for Embodied AI
TLDR
Atomic Skill Completion (ASC) is proposed, an approach for multi-task training for Embodied AI, where a set of atomic skills shared across multiple tasks are composed together to perform the tasks.
Habitat 2.0: Training Home Assistants to Rearrange their Habitat
TLDR
Habitat 2.0 is introduced, a simulation platform for training virtual robots in interactive 3D environments and complex physics-enabled scenarios, and it is found that flat RL policies struggle on HAB compared to hierarchical ones, and a hierarchy with independent skills suffers from ‘hand-off problems’.
iGibson 2.0: Object-Centric Simulation for Robot Learning of Everyday Household Tasks
TLDR
The new capabilities of iGibson 2.0 are evaluated to enable robot learning of novel tasks, in the hope of demonstrating the potential of this new simulator to support new directions of research in embodied AI.
IFOR: Iterative Flow Minimization for Robotic Object Rearrangement
TLDR
This work proposes IFOR, Iterative Flow Minimization for Robotic Object Rearrangement, an end-to-end method for the challenging problem of object rearrangement for unknown objects given an RGBD image of the original and final scenes and shows that this method applies to cluttered scenes, and in the real world, while training only on synthetic data.
Continuous Scene Representations for Embodied AI
TLDR
This work proposes Continuous Scene Representations (CSR), a scene representation constructed by an embodied agent navigating within a space, where objects and their relationships are modeled by continuous valued embeddings, to embed pair-wise relationships between objects in a latent space.
...
1
2
...

References

SHOWING 1-10 OF 43 REFERENCES
AllenAct: A Framework for Embodied AI Research
TLDR
AllenAct is introduced, a modular and flexible learning framework designed with a focus on the unique requirements of Embodied AI research that provides first-class support for a growing collection of embodied environments, tasks and algorithms.
Combined task and motion planning through an extensible planner-independent interface layer
TLDR
This work proposes a new approach that uses off-the-shelf task planners and motion planners and makes no assumptions about their implementation and uses a novel representational abstraction that requires only that failures in computing a motion plan for a high-level action be identifiable and expressible in the form of logical predicates at the task level.
HRL4IN: Hierarchical Reinforcement Learning for Interactive Navigation with Mobile Manipulators
TLDR
HRL4IN is proposed, a novel Hierarchical RL architecture for Interactive Navigation tasks that exploits the exploration benefits of HRL over flat RL for long-horizon tasks thanks to temporally extended commitments towards subgoals and significantly outperforms its baselines in terms of task performance and energy efficiency.
SAPIEN: A SimulAted Part-Based Interactive ENvironment
  • Fanbo Xiang, Yuzhe Qin, Hao Su
  • Computer Science
    2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2020
TLDR
SAPIEN is a realistic and physics-rich simulated environment that hosts a large-scale set of articulated objects that enables various robotic vision and interaction tasks that require detailed part-level understanding and hopes it will open research directions yet to be explored.
Cognitive Mapping and Planning for Visual Navigation
TLDR
The Cognitive Mapper and Planner is based on a unified joint architecture for mapping and planning, such that the mapping is driven by the needs of the task, and a spatial memory with the ability to plan given an incomplete set of observations about the world.
Target-driven visual navigation in indoor scenes using deep reinforcement learning
TLDR
This paper proposes an actor-critic model whose policy is a function of the goal as well as the current state, which allows better generalization and proposes the AI2-THOR framework, which provides an environment with high-quality 3D scenes and a physics engine.
RLBench: The Robot Learning Benchmark & Learning Environment
TLDR
This large-scale benchmark aims to accelerate progress in a number of vision-guided manipulation research areas, including: reinforcement learning, imitation learning, multi-task learning, geometric computer vision, and in particular, few-shot learning.
ROBOTURK: A Crowdsourcing Platform for Robotic Skill Learning through Imitation
TLDR
It is shown that the data obtained through RoboTurk enables policy learning on multi-step manipulation tasks with sparse rewards and that using larger quantities of demonstrations during policy learning provides benefits in terms of both learning consistency and final performance.
Habitat: A Platform for Embodied AI Research
TLDR
The comparison between learning and SLAM approaches from two recent works are revisited and evidence is found -- that learning outperforms SLAM if scaled to an order of magnitude more experience than previous investigations, and the first cross-dataset generalization experiments are conducted.
Data-Driven Grasp Synthesis—A Survey
TLDR
A review of the work on data-driven grasp synthesis and the methodologies for sampling and ranking candidate grasps and an overview of the different methodologies are provided, which draw a parallel to the classical approaches that rely on analytic formulations.
...
1
2
3
4
5
...