Target-driven visual navigation in indoor scenes using deep reinforcement learning

@article{Zhu2017TargetdrivenVN,
  title={Target-driven visual navigation in indoor scenes using deep reinforcement learning},
  author={Yuke Zhu and Roozbeh Mottaghi and Eric Kolve and Joseph J. Lim and Abhinav Kumar Gupta and Li Fei-Fei and Ali Farhadi},
  journal={2017 IEEE International Conference on Robotics and Automation (ICRA)},
  year={2017},
  pages={3357-3364}
}
Two less addressed issues of deep reinforcement learning are (1) lack of generalization capability to new goals, and (2) data inefficiency, i.e., the model requires several (and often costly) episodes of trial and error to converge, which makes it impractical to be applied to real-world scenarios. In this paper, we address these two issues and apply our model to target-driven visual navigation. To address the first issue, we propose an actor-critic model whose policy is a function of the goal… 

Figures and Tables from this paper

Towards Generalization in Target-Driven Visual Navigation by Using Deep Reinforcement Learning
TLDR
This article proposes a novel architecture composed of two networks, both exclusively trained in simulation, specifically designed to work together, while separately trained to help generalization in target-driven visual navigation.
Visual Navigation using Deep Reinforcement Learning
TLDR
An efficient neural network structure is proposed, which is capable of learning for multiple targets in multiple environments, and which surpasses the performance of state-of-the-art goal-oriented visual navigation methods from the literature.
Towards Target-Driven Visual Navigation in Indoor Scenes via Generative Imitation Learning
We present a target-driven navigation system to improve mapless visual navigation in indoor scenes. Our method takes a multi-view observation of a robot and a target image as inputs at each time step
Collision Anticipation via Deep Reinforcement Learning for Visual Navigation
TLDR
A deep reinforcement learning approach which is able to learn to navigate a scene to reach a given visual target, but anticipating the possible collisions with the environment, and offers an interesting generalization capability to reach visual targets that have never been seen during training.
Vision-based Navigation Using Deep Reinforcement Learning
TLDR
This work proposes a novel learning architecture capable of navigating an agent, e.g. a mobile robot, to a target given by an image, and extends the batched A2C algorithm with auxiliary tasks designed to improve visual navigation performance.
Effective Deep Reinforcement Learning Setups for Multiple Goals on Visual Navigation
TLDR
For visual topologic navigation, combining visual information of the current and goal states through Hadamard product or Gated-Attention module allows the network learning near-optimal navigation policies, and empirically shows that the ϵ-categorical policy helps to avoid local minimums during the training, which facilitates the convergence to better results.
Visual Navigation in Real-World Indoor Environments Using End-to-End Deep Reinforcement Learning
TLDR
A novel approach is presented that enables a direct deployment of the trained policy on real robots using a new powerful simulator capable of domain randomization and a tailored reward scheme fine-tuned on images collected from real-world environments.
Target-driven indoor visual navigation using inverse reinforcement learning
TLDR
This paper proposes to use inverse reinforcement learning to solve the problem of visual navigation which can provide more accurate and efficient guidance for decision-making and is able to learn a more effective reward function from less training data.
A deep Q network for robotic planning from image
TLDR
The Deep Reinforcement Learning (DRL) is proposed to address a planning issue which is different from the traditional SLAM algorithm, and a Q-CNN model is applied whose policy is to combine the Convolutional Neural Network (CNN) and the Q functions.
A Few Shot Adaptation of Visual Navigation Skills to New Observations using Meta-Learning
TLDR
This paper designs a policy architecture with latent features between perception and inference networks and quickly adapt the perception network via meta-learning while freezing the inference network and introduces a learning algorithm that enables rapid adaptation to new sensor configurations or target objects with a few shots.
...
...

References

SHOWING 1-10 OF 55 REFERENCES
End-to-End Training of Deep Visuomotor Policies
TLDR
This paper develops a method that can be used to learn policies that map raw image observations directly to torques at the robot's motors, trained using a partially observed guided policy search method, with supervision provided by a simple trajectory-centric reinforcement learning method.
ViZDoom: A Doom-based AI research platform for visual reinforcement learning
TLDR
A novel test-bed platform for reinforcement learning research from raw visual information which employs the first-person perspective in a semi-realistic 3D world and confirms the utility of ViZDoom as an AI research platform and implies that visual reinforcement learning in 3D realistic first- person perspective environments is feasible.
Policy Distillation
TLDR
A novel method called policy distillation is presented that can be used to extract the policy of a reinforcement learning agent and train a new network that performs at the expert level while being dramatically smaller and more efficient.
Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning
TLDR
This work defines a novel method of multitask and transfer learning that enables an autonomous agent to learn how to behave in multiple tasks simultaneously, and then generalize its knowledge to new domains, and uses Atari games as a testing environment to demonstrate these methods.
High speed obstacle avoidance using monocular vision and reinforcement learning
TLDR
An approach in which supervised learning is first used to estimate depths from single monocular images, which is able to learn monocular vision cues that accurately estimate the relative depths of obstacles in a scene is presented.
State of the Art Control of Atari Games Using Shallow Reinforcement Learning
TLDR
This paper systematically evaluates the importance of key representational biases encoded by DQN's network by proposing simple linear representations that make use of these concepts, and obtains a computationally practical feature set that achieves competitive performance to D QN in the ALE.
View-based Maps
TLDR
A mapping system based on retaining stereo views of the environment that are collected as the robot moves, which uses a vocabulary tree to propose candidate views, and a strong geometric filter to eliminate false positives.
VirtualWorlds as Proxy for Multi-object Tracking Analysis
TLDR
This work proposes an efficient real-to-virtual world cloning method, and validate the approach by building and publicly releasing a new video dataset, called "Virtual KITTI", automatically labeled with accurate ground truth for object detection, tracking, scene and instance segmentation, depth, and optical flow.
Trajectory Optimization using Reinforcement Learning for Map Exploration
TLDR
This paper addresses the problem of how a robot should plan to explore an unknown environment and collect data in order to maximize the accuracy of the resulting map and forms exploration as a constrained optimization problem and uses reinforcement learning to find trajectories that lead to accurate maps.
Galileo: Perceiving Physical Object Properties by Integrating a Physics Engine with Deep Learning
TLDR
This study points towards an account of human vision with generative physical knowledge at its core, and various recognition models as helpers leading to efficient inference.
...
...