Transferable Representation Learning in Vision-and-Language Navigation

  title={Transferable Representation Learning in Vision-and-Language Navigation},
  author={Haoshuo Huang and Vihan Jain and Harsh Mehta and Alexander Ku and Gabriel Magalh{\~a}es and Jason Baldridge and E. Ie},
  journal={2019 IEEE/CVF International Conference on Computer Vision (ICCV)},
  • Haoshuo Huang, Vihan Jain, +4 authors E. Ie
  • Published 2019
  • Computer Science
  • 2019 IEEE/CVF International Conference on Computer Vision (ICCV)
  • Vision-and-Language Navigation (VLN) tasks such as Room-to-Room (R2R) require machine agents to interpret natural language instructions and learn to act in visually realistic environments to achieve navigation goals. [...] Key Method Specifically, the representations are adapted to solve both a cross-modal sequence alignment and sequence coherence task. In the sequence alignment task, the model determines whether an instruction corresponds to a sequence of visual frames.Expand Abstract

    Figures, Tables, and Topics from this paper.

    Environment-agnostic Multitask Learning for Natural Language Grounded Navigation
    • 5
    • PDF
    VALAN: Vision and Language Agent Navigation
    • 4
    • PDF
    BabyWalk: Going Farther in Vision-and-Language Navigation by Taking Baby Steps
    • 4
    • PDF
    Sub-Instruction Aware Vision-and-Language Navigation
    • 4
    • PDF
    Object-and-Action Aware Model for Visual Language Navigation


    Publications referenced by this paper.
    Target-driven visual navigation in indoor scenes using deep reinforcement learning
    • 656
    • PDF
    Speaker-Follower Models for Vision-and-Language Navigation
    • 101
    • Highly Influential
    • PDF
    Show and tell: A neural image caption generator
    • 3,413
    • PDF
    Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
    • 5,129
    • PDF
    Long-term recurrent convolutional networks for visual recognition and description
    • 3,194
    • PDF
    Visual Representations for Semantic Target Driven Navigation
    • 55
    • PDF
    Deep visual-semantic alignments for generating image descriptions
    • 1,839