Corpus ID: 208614341

Commonsense and Semantic-Guided Navigation through Language in Embodied Environment

  title={Commonsense and Semantic-Guided Navigation through Language in Embodied Environment},
  author={Dian Yu and C. Khatri and Alexandros Papangelis and Andrea Madotto and M. Namazifar and Joost and Huizinga and Adrien Ecoffet and Huaixiu Zheng and Piero Molino and J. Clune and Z. Yu and Kenji Sagae and G. Tur},
  • Dian Yu, C. Khatri, +11 authors G. Tur
  • Published in ViGIL@NeurIPS 2019
  • Computer Science
  • Grounded language navigation tasks such as room navigation (e.g. “go to the kitchen“) and embodied question answering (e.g. “what color is the car?") in realistic 3D environments require the agent to be generalizable to unseen environments. However, previous research suggests that vision inputs do not contribute to multi-modality performances. Humans, on the other hand, utilize commonsense and semantic understanding of both language instruction and vision to navigate in new environments. In… CONTINUE READING
    1 Citations

    Figures, Tables, and Topics from this paper.


    Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments
    • 292
    • Highly Influential
    • PDF
    Embodied Question Answering
    • 235
    • PDF
    Learning to Navigate in Cities Without a Map
    • 129
    • PDF
    Speaker-Follower Models for Vision-and-Language Navigation
    • 113
    • Highly Influential
    • PDF
    IQA: Visual Question Answering in Interactive Environments
    • 160
    • PDF
    Building Generalizable Agents with a Realistic and Rich 3D Environment
    • 178
    • PDF
    Embodied Question Answering in Photorealistic Environments With Point Cloud Perception
    • 36
    • PDF
    Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation
    • 115
    • Highly Influential
    • PDF
    Neural Modular Control for Embodied Question Answering
    • 55
    • PDF