Embodied Agents for Efficient Exploration and Smart Scene Description

  title={Embodied Agents for Efficient Exploration and Smart Scene Description},
  author={Roberto Bigazzi and Marcella Cornia and Silvia Cascianelli and Lorenzo Baraldi and Rita Cucchiara},
—The development of embodied agents that can communicate with humans in natural language has gained in-creasing interest over the last years, as it facilitates the diffusion of robotic platforms in human-populated environments. As a step towards this objective, in this work, we tackle a setting for visual navigation in which an autonomous agent needs to explore and map an unseen indoor environment while portraying interesting scenes with natural language descriptions. To this end, we propose… 

Figures and Tables from this paper



Out of the Box: Embodied Navigation in the Real World

This work describes the architectural discrepancies that damage the Sim2Real adaptation ability of models trained on the Habitat simulator and proposes a novel solution tailored towards the deployment in realworld scenarios.

Explore and Explain: Self-supervised Navigation and Recounting

This paper devise a novel embodied setting in which an agent needs to explore a previously unknown environment while recounting what it sees during the path, and integrates a novel self-supervised exploration module with penalty, and a fully-attentive captioning model for explanation.

Embodied scene description

The Embodied Scene Description is proposed, which exploits the embodiment ability of the agent to find an optimal viewpoint in its environment for scene description tasks, and a mobile application is developed, which can be used to assist visually-impaired people to better understand their surroundings.

An Exploration of Embodied Visual Exploration

This work presents a taxonomy for existing visual exploration algorithms and creates a standard framework for benchmarking them, and performs a thorough empirical study of the four state-of-the-art paradigms using the proposed framework with two photorealistic simulated 3D environments.

Gibson Env: Real-World Perception for Embodied Agents

This paper investigates developing real-world perception for active agents, proposes Gibson Environment for this purpose, and showcases a set of perceptual tasks learned therein.

SMArT: Training Shallow Memory-aware Transformers for Robotic Explainability

This paper proposes a fully-attentive captioning algorithm which can provide state-of-the-art performances on language generation while restricting its computational demands and incorporates a novel memory-aware encoding of image regions.

Occupancy Anticipation for Efficient Exploration and Navigation

This work proposes occupancy anticipation, where the agent uses its egocentric RGB-D observations to infer the occupancy state beyond the visible regions, which facilitates efficient exploration and navigation in 3D environments.

Sim-to-Real Transfer for Vision-and-Language Navigation

To bridge the gap between the high-level discrete action space learned by the VLN agent, and the robot's low-level continuous action space, a subgoal model is proposed to identify nearby waypoints, and domain randomization is used to mitigate visual domain differences.

Object Goal Navigation using Goal-Oriented Semantic Exploration

A modular system called, `Goal-Oriented Semantic Exploration' which builds an episodic semantic map and uses it to explore the environment efficiently based on the goal object category and outperforms a wide range of baselines including end-to-end learning-based methods as well as modular map- based methods.