Corpus ID: 218665288

Simple Sensor Intentions for Exploration

  title={Simple Sensor Intentions for Exploration},
  author={Tim Hertweck and Martin A. Riedmiller and Michael Bloesch and Jost Tobias Springenberg and Noah Siegel and Markus Wulfmeier and Roland Hafner and Nicolas Manfred Otto Heess},
Modern reinforcement learning algorithms can learn solutions to increasingly difficult control problems while at the same time reduce the amount of prior knowledge needed for their application. One of the remaining challenges is the definition of reward schemes that appropriately facilitate exploration without biasing the solution in undesirable ways, and that can be implemented on real robotic systems without expensive instrumentation. In this paper we focus on a setting in which goal tasks… Expand
Towards General and Autonomous Learning of Core Skills: A Case Study in Locomotion
This paper develops a learning framework that can learn sophisticated locomotion behavior for a wide spectrum of legged robots, such as bipeds, tripeds, quadrupeds and hexapods, including wheeled variants and demonstrates that the same algorithm can rapidly learn diverse and reusable locomotion skills without any platform specific adjustments or additional instrumentation of the learning setup. Expand
Collect & Infer - a fresh look at data-efficient Reinforcement Learning
This position paper proposes a fresh look at Reinforcement Learning from the perspective of data-efficiency, and explicitly models RL as two separate but interconnected processes, concerned with data collection and knowledge inference respectively, via a paradigm that it is argued can only be achieved through careful consideration of both aspects. Expand
Is Curiosity All You Need? On the Utility of Emergent Behaviours from Curious Exploration
Curiosity-esque reward schemes have been used in different ways to facilitate exploration in sparse tasks or pre-train policy networks before fine-tuning them on difficult downstream tasks (Sekar et al., 2020). Expand
Representation Matters: Improving Perception and Exploration for Robotics
This work systematically evaluates a number of common learnt and hand-engineered representations in the context of three robotics tasks: lifting, stacking and pushing of 3D blocks, to serve as a step towards a more systematic understanding of what makes a good representation for control in robotics. Expand


Hindsight Experience Replay
A novel technique is presented which allows sample-efficient learning from rewards which are sparse and binary and therefore avoid the need for complicated reward engineering and may be seen as a form of implicit curriculum. Expand
Learning by Playing - Solving Sparse Reward Tasks from Scratch
The key idea behind the method is that active (learned) scheduling and execution of auxiliary policies allows the agent to efficiently explore its environment - enabling it to excel at sparse reward RL. Expand
Simultaneously Learning Vision and Feature-based Control Policies for Real-world Ball-in-a-Cup
A method for fast training of vision based control policies on real robots that allows auxiliary task policies to utilize task features that are available only at training-time, and shows that the task can be learned from-scratch, i.e., with no transfer from simulation and no imitation learning. Expand
Reinforcement Learning with Unsupervised Auxiliary Tasks
This paper significantly outperforms the previous state-of-the-art on Atari, averaging 880\% expert human performance, and a challenging suite of first-person, three-dimensional \emph{Labyrinth} tasks leading to a mean speedup in learning of 10$\times$ and averaging 87\% Expert human performance on Labyrinth. Expand
Emergence of Locomotion Behaviours in Rich Environments
This paper explores how a rich environment can help to promote the learning of complex behavior, and finds that this encourages the emergence of robust behaviours that perform well across a suite of tasks. Expand
The Intentional Unintentional Agent: Learning to Solve Many Continuous Control Tasks Simultaneously
It is shown that the Intentional Unintentional agent not only learns to solve many tasks simultaneously but it also learns faster than agents that target a single task at-a-time. Expand
Diversity is All You Need: Learning Skills without a Reward Function
The proposed DIAYN ("Diversity is All You Need"), a method for learning useful skills without a reward function, learns skills by maximizing an information theoretic objective using a maximum entropy policy. Expand
Control What You Can: Intrinsically Motivated Task-Planning Agent
This work combines several task-level planning agent structures (backtracking search on task graph, probabilistic road-maps, allocation of search efforts) with intrinsic motivation to achieve learning from scratch. Expand
Deep Successor Reinforcement Learning
DSR is presented, which generalizes Successor Representations within an end-to-end deep reinforcement learning framework and has several appealing properties including: increased sensitivity to distal reward changes due to factorization of reward and world dynamics, and the ability to extract bottleneck states given successor maps trained under a random policy. Expand
Hierarchical Relative Entropy Policy Search
This work defines the problem of learning sub-policies in continuous state action spaces as finding a hierarchical policy that is composed of a high-level gating policy to select the low-level sub-Policies for execution by the agent and treats them as latent variables which allows for distribution of the update information between the sub- policies. Expand