Cross-View Policy Learning for Street Navigation

@article{Li2019CrossViewPL,
  title={Cross-View Policy Learning for Street Navigation},
  author={Ang Li and Huiyi Hu and Piotr Wojciech Mirowski and Mehrdad Farajtabar},
  journal={2019 IEEE/CVF International Conference on Computer Vision (ICCV)},
  year={2019},
  pages={8099-8108}
}
The ability to navigate from visual observations in unfamiliar environments is a core component of intelligent agents and an ongoing challenge for Deep Reinforcement Learning (RL). Street View can be a sensible testbed for such RL agents, because it provides real-world photographic imagery at ground level, with diverse street appearances; it has been made into an interactive environment called StreetLearn and used for research on navigation. However, goal-driven street navigation agents have… 

Figures from this paper

Learning To Follow Directions in Street View

An instruction-following task that requires all of the above, and which combines the practicality of simulated environments with the challenges of ambiguous, noisy real world data, is proposed.

CityLearn: Diverse Real-World Environments for Sample-Efficient Navigation Policy Learning

This paper first leverage place recognition and deep learning techniques combined with goal destination feedback to generate compact, bimodal image representations that can then be used to effectively learn control policies from a small amount of experience.

Learning to Stop: A Simple yet Effective Approach to Urban Vision-Language Navigation

This work proposes Learning to Stop (L2Stop), a simple yet effective policy module that differentiates STOP and other actions inVision-and-Language Navigation, and achieves the new state of the art on a challenging urban VLN dataset Touchdown.

Learning to Navigate

  • Piotr Wojciech Mirowski
  • Computer Science
    1st International Workshop on Multimodal Understanding and Learning for Embodied Applications - MULEA '19
  • 2019
This talk will cover the own journey to understand navigation by building deep reinforcement learning agents, starting from learning to control a simple agent that can explore and memorise large 3D mazes to designing agents with a read-write memory that can generalise to unseen mazes from one traversal.

Domain Adaptation of Learned Featuresfor Visual Localization

This work proposes a few-shot domain adaptation framework for learned local features that deals with varying conditions in visual localization, and demonstrates the superior performance over baselines, while using a scarce number of training examples from the target domain.

Looking Farther in Parametric Scene Parsing with Ground and Aerial Imagery

A novel architecture is proposed, Unified, that combines features from both aerial and ground imagery to infer scene attributes and shows the importance of aerial imagery in understanding road scenes, especially in regions farther away from the ego-vehicle.

SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference

A modern scalable reinforcement learning agent called SEED (Scalable, Efficient Deep-RL), which is able to train on millions of frames per second and lower the cost of experiments compared to current methods with a simple architecture.

ACCELERATED CENTRAL INFERENCE

A modern scalable reinforcement learning agent called SEED (Scalable, Efficient Deep-RL), which is able to train on millions of frames per second and lower the cost of experiments compared to current methods with a simple architecture that features centralized inference and an optimized communication layer.

Revisiting Street-to-Aerial View Image Geo-localization and Orientation Estimation

It is shown that improvements in metric learning techniques significantly boost the performance regardless of the alignment, and a novel method to estimate the orientation/alignment between a pair of cross-view images with unknown alignment information is proposed.

Offline RL With Resource Constrained Online Deployment

This work introduces the use of transfer learning to address the performance gap between policies trained using the full offline dataset and policiestrained using limited features, and proposes a data collection procedure: Resource Constrained-Datasets for RL (RC-D4RL).

References

SHOWING 1-10 OF 51 REFERENCES

Learning To Follow Directions in Street View

An instruction-following task that requires all of the above, and which combines the practicality of simulated environments with the challenges of ambiguous, noisy real world data, is proposed.

Target-driven visual navigation in indoor scenes using deep reinforcement learning

This paper proposes an actor-critic model whose policy is a function of the goal as well as the current state, which allows better generalization and proposes the AI2-THOR framework, which provides an environment with high-quality 3D scenes and a physics engine.

The StreetLearn Environment and Dataset

To support and validate research in end-to-end navigation, StreetLearn is presented: an interactive, first-person, partially-observed visual environment that uses Google Street View for its photographic content and broad coverage, and performance baselines for a challenging goal-driven navigation task.

Visual Representations for Semantic Target Driven Navigation

This work proposes to use semantic segmentation and detection masks as observations obtained by state-of-the-art computer vision algorithms and use a deep network to learn navigation policies on top of representations that capture spatial layout and semantic contextual cues.

Learning to Navigate in Cities Without a Map

This work proposes a dual pathway architecture that allows locale-specific features to be encapsulated, while still enabling transfer to multiple cities, and presents an interactive navigation environment that uses Google StreetView for its photographic content and worldwide coverage.

Transfer Deep Reinforcement Learning in 3 D Environments : An Empirical Study

It is shown that the features learnt by the navigation network can be effectively utilized to transfer knowledge between a diverse set of tasks, such as object collection, deathmatch, and self-localization.

The AdobeIndoorNav Dataset: Towards Deep Reinforcement Learning based Real-world Indoor Robot Visual Navigation

This paper presents a new dataset collected in real-world to facilitate the research in DRL based visual navigation, which provides high-quality visual inputs with real- world scene complexity to the robot at dense grid locations.

Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation

A novel Reinforced Cross-Modal Matching (RCM) approach that enforces cross-modal grounding both locally and globally via reinforcement learning (RL), and a Self-Supervised Imitation Learning (SIL) method to explore unseen environments by imitating its own past, good decisions is introduced.

Deep reinforcement learning with successor features for navigation across similar environments

This paper proposes a successor-feature-based deep reinforcement learning algorithm that can learn to transfer knowledge from previously mastered navigation tasks to new problem instances and substantially decreases the required learning time after the first task instance has been solved.

One-Shot Reinforcement Learning for Robot Navigation with Interactive Replay

This work presents a method for learning to navigate, to a fixed goal and in a known environment, on a mobile robot that leverages an interactive world model built from a single traversal of the environment, a pre-trained visual feature encoder, and stochastic environmental augmentation.
...