• Corpus ID: 53116049

Visual Semantic Navigation using Scene Priors

@article{Yang2019VisualSN,
  title={Visual Semantic Navigation using Scene Priors},
  author={Wei Yang and X. Wang and Ali Farhadi and Abhinav Kumar Gupta and Roozbeh Mottaghi},
  journal={ArXiv},
  year={2019},
  volume={abs/1810.06543}
}
How do humans navigate to target objects in novel scenes. [] Key Method The agent uses the features from the knowledge graph to predict the actions. For evaluation, we use the AI2-THOR framework. Our experiments show how semantic knowledge improves performance significantly. More importantly, we show improvement in generalization to unseen scenes and/or objects. The supplementary video can be accessed at the following link: this https URL .

Figures and Tables from this paper

MaAST: Map Attention with Semantic Transformers for Efficient Visual Navigation
TLDR
This work proposes a method to encode vital scene semantics such as traversable paths, unexplored areas, and observed scene objects–alongside raw visual streams such as RGB, depth, and semantic segmentation masks—into a semantically informed, top-down egocentric map representation and introduces a novel 2-D map attention mechanism.
Learning to Map for Active Semantic Goal Navigation
TLDR
This work proposes a novel framework that actively learns to generate semantic maps outside the field of view of the agent and leverages the uncertainty over the semantic classes in the unobserved areas to decide on long term goals.
Utilising Prior Knowledge for Visual Navigation: Distil and Adapt
TLDR
This paper proposes to decompose the value function in the actor-critic reinforcement learning algorithm and incorporate the prior in the critic in a novel way that reduces the model complexity and improves model generalisation.
Unsupervised Domain Adaptation for Visual Navigation
TLDR
This paper proposes an unsupervised domain adaptation method for visual navigation that translates the images in the target domain to the source domain such that the translation is consistent with the representations learned by the navigation policy.
Exploiting Scene-specific Features for Object Goal Navigation
TLDR
A new reduced dataset is introduced that speeds up the training of navigation models, a notoriously complex task, and the SMTSC model is proposed, an attention-based model capable of exploiting the correlation between scenes and objects contained in them.
Visual Navigation with Spatial Attention
TLDR
The attention model is shown to improve the agent’s policy and to achieve state-of-the-art results on commonly-used datasets.
Learning Embeddings that Capture Spatial Semantics for Indoor Navigation
TLDR
This work studies how object embeddings that capture spatial semantic priors can guide search and navigation task in a structured environment and proposes a method to incorporate such spatial semantic awareness in robots by leveraging pre-trained language models and multirelational knowledge bases as object embeddeddings.
Seeing the Un-Scene: Learning Amodal Semantic Maps for Room Navigation
TLDR
A learning-based approach for room navigation using semantic maps that learns to predict top-down belief maps of regions that lie beyond the agent's field of view while modeling architectural and stylistic regularities in houses.
Improving Target-driven Visual Navigation with Attention on 3D Spatial Relationships
TLDR
This paper investigates the target-driven visual navigation using deep reinforcement learning (DRL) in 3D indoor scenes, whose navigation task aims to train an agent that can intelligently make a series of decisions to arrive at a pre-specified target location from any possible starting positions only based on egocentric views.
VTNet: Visual Transformer Network for Object Goal Navigation
TLDR
A Visual Transformer Network (VTNet) for learning informative visual representation in navigation that embeds object and region features with their location cues as spatial-aware descriptors and then incorporates all the encoded descriptors through attention operations to achieve informative representation for navigation.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 55 REFERENCES
Visual Representations for Semantic Target Driven Navigation
TLDR
This work proposes to use semantic segmentation and detection masks as observations obtained by state-of-the-art computer vision algorithms and use a deep network to learn navigation policies on top of representations that capture spatial layout and semantic contextual cues.
Target-driven visual navigation in indoor scenes using deep reinforcement learning
TLDR
This paper proposes an actor-critic model whose policy is a function of the goal as well as the current state, which allows better generalization and proposes the AI2-THOR framework, which provides an environment with high-quality 3D scenes and a physics engine.
DeepNav: Learning to Navigate Large Cities
TLDR
The proposed DeepNav models are evaluated on 4 held-out cities for navigating to 5 different types of destinations and it is shown that the algorithms outperform previous work that uses hand-crafted features and Support Vector Regression (SVR).
AI2-THOR: An Interactive 3D Environment for Visual AI
TLDR
AI2-THOR consists of near photo-realistic 3D indoor scenes, where AI agents can navigate in the scenes and interact with objects to perform tasks and facilitate building visually intelligent models.
Semi-parametric Topological Memory for Navigation
TLDR
A new memory architecture for navigation in previously unseen environments, inspired by landmark-based navigation in animals, that consists of a (non-parametric) graph with nodes corresponding to locations in the environment and a deep network capable of retrieving nodes from the graph based on observations.
Cognitive Mapping and Planning for Visual Navigation
TLDR
The Cognitive Mapper and Planner is based on a unified joint architecture for mapping and planning, such that the mapping is driven by the needs of the task, and a spatial memory with the ability to plan given an incomplete set of observations about the world.
Self-Supervised Deep Reinforcement Learning with Generalized Computation Graphs for Robot Navigation
TLDR
A generalized computation graph is proposed that subsumes value-based model-free methods and model-based methods, and is instantiate to form a navigation model that learns from raw images and is sample efficient, and outperforms single-step and double-step double Q-learning.
Visual Translation Embedding Network for Visual Relation Detection
TLDR
This work proposes a novel feature extraction layer that enables object-relation knowledge transfer in a fully-convolutional fashion that supports training and inference in a single forward/backward pass, and proposes the first end-toend relation detection network.
Contextual Priming and Feedback for Faster R-CNN
TLDR
This paper proposes to augment Faster R-CNN with a semantic segmentation network, and uses segmentation to provide top-down iterative feedback using two stage training, and results indicate that all three contributions improve the performance on object detection, semantic segmentsation and region proposal generation.
segDeepM: Exploiting segmentation and context in deep neural networks for object detection
TLDR
This paper frames the problem as inference in a Markov Random Field, in which each detection hypothesis scores object appearance as well as contextual information using Convolutional Neural Networks, and allows the hypothesis to choose and score a segment out of a large pool of accurate object segmentation proposals.
...
1
2
3
4
5
...