Embodied Agents for Efficient Exploration and Smart Scene Description
@article{Bigazzi2023EmbodiedAF, title={Embodied Agents for Efficient Exploration and Smart Scene Description}, author={Roberto Bigazzi and Marcella Cornia and Silvia Cascianelli and Lorenzo Baraldi and Rita Cucchiara}, journal={ArXiv}, year={2023}, volume={abs/2301.07150} }
—The development of embodied agents that can communicate with humans in natural language has gained in-creasing interest over the last years, as it facilitates the diffusion of robotic platforms in human-populated environments. As a step towards this objective, in this work, we tackle a setting for visual navigation in which an autonomous agent needs to explore and map an unseen indoor environment while portraying interesting scenes with natural language descriptions. To this end, we propose…
References
SHOWING 1-10 OF 60 REFERENCES
Out of the Box: Embodied Navigation in the Real World
- Computer ScienceCAIP
- 2021
This work describes the architectural discrepancies that damage the Sim2Real adaptation ability of models trained on the Habitat simulator and proposes a novel solution tailored towards the deployment in realworld scenarios.
Explore and Explain: Self-supervised Navigation and Recounting
- Computer Science2020 25th International Conference on Pattern Recognition (ICPR)
- 2021
This paper devise a novel embodied setting in which an agent needs to explore a previously unknown environment while recounting what it sees during the path, and integrates a novel self-supervised exploration module with penalty, and a fully-attentive captioning model for explanation.
Embodied scene description
- Computer ScienceAuton. Robots
- 2022
The Embodied Scene Description is proposed, which exploits the embodiment ability of the agent to find an optimal viewpoint in its environment for scene description tasks, and a mobile application is developed, which can be used to assist visually-impaired people to better understand their surroundings.
An Exploration of Embodied Visual Exploration
- Computer ScienceInternational Journal of Computer Vision
- 2021
This work presents a taxonomy for existing visual exploration algorithms and creates a standard framework for benchmarking them, and performs a thorough empirical study of the four state-of-the-art paradigms using the proposed framework with two photorealistic simulated 3D environments.
Gibson Env: Real-World Perception for Embodied Agents
- Computer Science2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
- 2018
This paper investigates developing real-world perception for active agents, proposes Gibson Environment for this purpose, and showcases a set of perceptual tasks learned therein.
SMArT: Training Shallow Memory-aware Transformers for Robotic Explainability
- Computer Science2020 IEEE International Conference on Robotics and Automation (ICRA)
- 2020
This paper proposes a fully-attentive captioning algorithm which can provide state-of-the-art performances on language generation while restricting its computational demands and incorporates a novel memory-aware encoding of image regions.
Occupancy Anticipation for Efficient Exploration and Navigation
- Computer ScienceECCV
- 2020
This work proposes occupancy anticipation, where the agent uses its egocentric RGB-D observations to infer the occupancy state beyond the visible regions, which facilitates efficient exploration and navigation in 3D environments.
Sim-to-Real Transfer for Vision-and-Language Navigation
- Computer ScienceCoRL
- 2020
To bridge the gap between the high-level discrete action space learned by the VLN agent, and the robot's low-level continuous action space, a subgoal model is proposed to identify nearby waypoints, and domain randomization is used to mitigate visual domain differences.
Object Goal Navigation using Goal-Oriented Semantic Exploration
- Computer ScienceNeurIPS
- 2020
A modular system called, `Goal-Oriented Semantic Exploration' which builds an episodic semantic map and uses it to explore the environment efficiently based on the goal object category and outperforms a wide range of baselines including end-to-end learning-based methods as well as modular map- based methods.
Multimodal attention networks for low-level vision-and-language navigation
- Computer ScienceComput. Vis. Image Underst.
- 2021