Visual Room Rearrangement

@article{Weihs2021VisualRR,
  title={Visual Room Rearrangement},
  author={Luca Weihs and Matt Deitke and Aniruddha Kembhavi and Roozbeh Mottaghi},
  journal={2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2021},
  pages={5918-5927}
}
There has been a significant recent progress in the field of Embodied AI with researchers developing models and algorithms enabling embodied agents to navigate and interact within completely unseen environments. In this paper, we propose a new dataset and baseline models for the task of Rearrangement. We particularly focus on the task of Room Rearrangement: an agent begins by exploring a room and recording objects’ initial configurations. We then remove the agent and change the poses and states… 

Figures and Tables from this paper

Learning to Explore, Navigate and Interact for Visual Room Rearrangement
TLDR
A three-phased modular architecture (TMA) for visual room rearrangement that maximizes the performance by placing the learning modules along with hand-crafted feature engineering modules—retaining the advantage of learning while reducing the cost of learning.
Semantically Grounded Object Matching for Robust Robotic Scene Rearrangement
TLDR
This work presents a novel approach to object matching that uses a large pre-trained vision-language model to match objects in a cross-instance setting by leveraging semantics together with visual features as a more robust, and much more general, measure of similarity.
Object Manipulation via Visual Target Localization
TLDR
This work proposes Manipulation via Visual Object Location Estimation (m-VOLE), an approach that explores the environment in search for target objects, computes their 3D coordinates once they are located, and then continues to estimate their3D locations even when the objects are not visible, thus robustly aiding the task of manipulating these objects throughout the episode.
Continuous Scene Representations for Embodied AI
TLDR
This work proposes Continuous Scene Representations (CSR), a scene representation constructed by an embodied agent navigating within a space, where objects and their relationships are modeled by continuous valued embeddings, to embed pair-wise relationships between objects in a latent space.
Shaping embodied agent behavior with activity-context priors from egocentric video
TLDR
This work introduces an approach to discover activitycontext priors from in-the-wild egocentric video captured with human worn cameras, encoding the video-based prior as an auxiliary reward function that encourages an agent to bring compatible objects together before attempting an interaction.
Shaping embodied agent behavior with activity-context priors from egocentric video
TLDR
This work introduces an approach to discover activitycontext priors from in-the-wild egocentric video captured with human worn cameras, encoding the video-based prior as an auxiliary reward function that encourages an agent to bring compatible objects together before attempting an interaction.
CLIP on Wheels: Zero-Shot Object Navigation as Object Localization and Exploration
TLDR
This paper translates the success of zero-shot vision models to the popular embodied AI task of object navigation, and finds that a straightforward CoW, with CLIP-based object localization plus classical exploration, and no additional training, often outperforms learnable approaches in terms of success, efficiency, and robustness to dataset distribution shift.
Where Does It Belong? Autonomous Object Mapping in Open-World Settings
TLDR
The results show that, even with a targeted training set, the approach outperforms the baseline for most test cases, and the method’s effectiveness in real robot experiments is demonstrated.
ASC me to Do Anything: Multi-task Training for Embodied AI
TLDR
Atomic Skill Completion (ASC) is proposed, an approach for multi-task training for Embodied AI, where a set of atomic skills shared across multiple tasks are composed together to perform the tasks.
IFOR: Iterative Flow Minimization for Robotic Object Rearrangement
TLDR
This work proposes IFOR, Iterative Flow Minimization for Robotic Object Rearrangement, an end-to-end method for the challenging problem of object rearrangement for unknown objects given an RGBD image of the original and final scenes and shows that this method applies to cluttered scenes, and in the real world, while training only on synthetic data.
...
1
2
3
...

References

SHOWING 1-10 OF 57 REFERENCES
AllenAct: A Framework for Embodied AI Research
TLDR
AllenAct is introduced, a modular and flexible learning framework designed with a focus on the unique requirements of Embodied AI research that provides first-class support for a growing collection of embodied environments, tasks and algorithms.
Rearrangement: A Challenge for Embodied AI
TLDR
A framework for research and evaluation in Embodied AI is described, based on a canonical task: Rearrangement, that can focus the development of new techniques and serve as a source of trained models that can be transferred to other settings.
Cognitive Mapping and Planning for Visual Navigation
TLDR
The Cognitive Mapper and Planner is based on a unified joint architecture for mapping and planning, such that the mapping is driven by the needs of the task, and a spatial memory with the ability to plan given an incomplete set of observations about the world.
Monte-Carlo Tree Search for Efficient Visually Guided Rearrangement Planning
TLDR
This work introduces an efficient and scalable rearrangement planning method, based on a Monte-Carlo Tree Search exploration strategy, and develops an integrated approach for robust multi-object workspace state estimation from a single uncalibrated RGB camera using a deep neural network trained only with synthetic data.
IQA: Visual Question Answering in Interactive Environments
TLDR
The Hierarchical Interactive Memory Network (HIMN), consisting of a factorized set of controllers, allowing the system to operate at multiple levels of temporal abstraction, is proposed, and outperforms popular single controller based methods on IQUAD V1.
Occupancy Anticipation for Efficient Exploration and Navigation
TLDR
This work proposes occupancy anticipation, where the agent uses its egocentric RGB-D observations to infer the occupancy state beyond the visible regions, which facilitates efficient exploration and navigation in 3D environments.
Pick and Place Without Geometric Object Models
TLDR
This approach can solve a challenging class of pick-place and regrasping problems where the exact geometry of the objects to be handled is unknown and shows a major improvement relative to a shape primitives baseline.
Two Body Problem: Collaborative Visual Task Completion
TLDR
This paper studies the problem of learning to collaborate directly from pixels in AI2-THOR and demonstrates the benefits of explicit and implicit modes of communication to perform visual tasks.
Object Goal Navigation using Goal-Oriented Semantic Exploration
TLDR
A modular system called, `Goal-Oriented Semantic Exploration' which builds an episodic semantic map and uses it to explore the environment efficiently based on the goal object category and outperforms a wide range of baselines including end-to-end learning-based methods as well as modular map- based methods.
RoboTHOR: An Open Simulation-to-Real Embodied AI Platform
TLDR
RoboTHOR offers a framework of simulated environments paired with physical counterparts to systematically explore and overcome the challenges of simulation-to-real transfer, and a platform where researchers across the globe can remotely test their embodied models in the physical world.
...
1
2
3
4
5
...