Object Manipulation via Visual Target Localization

@article{Ehsani2022ObjectMV,
  title={Object Manipulation via Visual Target Localization},
  author={Kiana Ehsani and Ali Farhadi and Aniruddha Kembhavi and Roozbeh Mottaghi},
  journal={ArXiv},
  year={2022},
  volume={abs/2203.08141}
}
Object manipulation is a critical skill required for Embodied AI agents interacting with the world around them. Training agents to manipulate objects, poses many challenges. These include occlusion of the target object by the agent’s arm, noisy object detection and localization, and the target frequently going out of view as the agent moves around in the scene. We propose Manipulation via Visual Object Location Estimation (m-VOLE), an approach that explores the environment in search for target… 

ProcTHOR: Large-Scale Embodied AI Using Procedural Generation

TLDR
The proposed PROCTHOR, a framework for procedural generation of Embodied AI environments, enables us to sample arbitrarily large datasets of diverse, interactive, customizable, and performant virtual environments to train and evaluate embodied agents across navigation, interaction, and manipulation tasks.

References

SHOWING 1-10 OF 58 REFERENCES

ManipulaTHOR: A Framework for Visual Object Manipulation

TLDR
This work proposes a framework for object manipulation built upon the physics-enabled, visually rich AI2-THOR framework and presents a new challenge to the Embodied AI community known as ArmPointNav, which extends the popular point navigation task to object manipulation and offers new challenges including 3D obstacle avoidance.

Pushing it out of the Way: Interactive Visual Navigation

TLDR
This paper introduces the Neural Interaction Engine (NIE) to explicitly predict the change in the environment caused by the agent’s actions, and finds that agents equipped with an NIE exhibit significant improvements in their navigational capabilities.

Visual Room Rearrangement

TLDR
The experiments show that solving this challenging interactive task that involves navigation and object interaction is beyond the capabilities of the current state-of-the-art techniques for embodied tasks and the authors are still very far from achieving perfect performance on these types of tasks.

Integrating Egocentric Localization for More Realistic Point-Goal Navigation Agents

TLDR
This work develops point-goal navigation agents that rely on visual estimates of egomotion under noisy action dynamics, and enables a seamless adaption to changing dynamics (a different robot or floor type) by simply re-calibrating the visual odometry model.

Teaching Agents how to Map: Spatial Reasoning for Multi-Object Navigation

TLDR
This work shows that learning to estimate metrics quantifying the spatial relationships between an agent at a given location and a goal to reach has a high positive impact in Multi-Object Navigation settings, and significantly improves the performance of different baseline agents.

Robotic pick-and-place of novel objects in clutter with multi-affordance grasping and cross-domain image matching

TLDR
A robotic pick-and-place system that is capable of grasping and recognizing both known and novel objects in cluttered environments and that handles a wide range of object categories without needing any task-specific training data for novel objects is presented.

Object Goal Navigation using Goal-Oriented Semantic Exploration

TLDR
A modular system called, `Goal-Oriented Semantic Exploration' which builds an episodic semantic map and uses it to explore the environment efficiently based on the goal object category and outperforms a wide range of baselines including end-to-end learning-based methods as well as modular map- based methods.

Target-driven visual navigation in indoor scenes using deep reinforcement learning

TLDR
This paper proposes an actor-critic model whose policy is a function of the goal as well as the current state, which allows better generalization and proposes the AI2-THOR framework, which provides an environment with high-quality 3D scenes and a physics engine.

Learning to See before Learning to Act: Visual Pre-training for Manipulation

TLDR
It is found that pre-training on vision tasks significantly improves generalization and sample efficiency for learning to manipulate objects, and directly transferring model parameters from vision networks to affordance prediction networks can result in successful zero-shot adaptation.

ObjectNav Revisited: On Evaluation of Embodied Agents Navigating to Objects

TLDR
This document summarizes the consensus recommendations of this working group on ObjectNav and makes recommendations on subtle but important details of evaluation criteria, the agent's embodiment parameters, and the characteristics of the environments within which the task is carried out.
...