Maximizing Information Gain in Partially Observable Environments via Prediction Reward

  title={Maximizing Information Gain in Partially Observable Environments via Prediction Reward},
  author={Yash Satsangi and Sungsu Lim and Shimon Whiteson and Frans A. Oliehoek and Martha White},
Information gathering in a partially observable environment can be formulated as a reinforcement learning (RL), problem where the reward depends on the agent's uncertainty. For example, the reward can be the negative entropy of the agent's belief over an unknown (or hidden) variable. Typically, the rewards of an RL agent are defined as a function of the state-action pairs and not as a function of the belief of the agent; this hinders the direct application of deep RL methods for such tasks… 
Adapting Behaviour via Intrinsic Reward: A Survey and Empirical Study
The interaction between reward and prediction learners is discussed and the importance of introspective prediction learners are highlighted: those that increase their rate of learning when progress is possible, and decrease when it is not.
Adapting Behavior via Intrinsic Reward: A Survey and Empirical Study
The interaction between reward and prediction learners is discussed and the importance of introspective prediction learners are highlighted: those that increase their rate of learning when progress is possible, and decrease when it is not.
Multi-agent active perception with prediction rewards
This paper models multi-agent active perception as a decentralized partially observable Markov decision process (Dec-POMDP) with a convex centralized prediction reward and proves that by introducing individual prediction actions for each agent, the problem is converted into a standard Dec-PomDP with a decentralized prediction reward.
Reinforcement Learning with Efficient Active Feature Acquisition
A model-based reinforcement learning framework that learns an active feature acquisition policy to solve the exploration-exploitation problem during its execution that outperforms conventional baselines and results in policies with greater cost efficiency.
Adapting Behaviour via Intrinsic Reward
Learning about many things can provide numerous benefits to a reinforcement learning system. How to behave to best learn a collection of predictions in parallel in the reinforcement learning setting
RECON: Rapid Exploration for Open-World Navigation with Latent Goal Models
A robotic learning system for autonomous navigation in diverse environments with a non-parametric map that reflects the connectivity of the environment but does not require geometric reconstruction or localization, and a latent variable model of distances and actions for efficiently constructing and traversing this map.


Decision-theoretic planning under uncertainty with information rewards for active cooperative perception
This work presents the POMDP with Information Rewards (POMDP-IR) modeling framework, which rewards an agent for reaching a certain level of belief regarding a state feature, and demonstrates their use for active cooperative perception scenarios.
Reinforcement Learning with Unsupervised Auxiliary Tasks
This paper significantly outperforms the previous state-of-the-art on Atari, averaging 880\% expert human performance, and a challenging suite of first-person, three-dimensional \emph{Labyrinth} tasks leading to a mean speedup in learning of 10$\times$ and averaging 87\% Expert human performance on Labyrinth.
Curiosity-Driven Exploration by Self-Supervised Prediction
This work forms curiosity as the error in an agent's ability to predict the consequence of its own actions in a visual feature space learned by a self-supervised inverse dynamics model, which scales to high-dimensional continuous state spaces like images, bypasses the difficulties of directly predicting pixels, and ignores the aspects of the environment that cannot affect the agent.
Deep Variational Reinforcement Learning for POMDPs
Deep variational reinforcement learning (DVRL) is proposed, which introduces an inductive bias that allows an agent to learn a generative model of the environment and perform inference in that model to effectively aggregate the available information.
A POMDP Extension with Belief-dependent Rewards
Partially Observable Markov Decision Processes (POMDPs) model sequential decision-making problems under uncertainty and partial observability. Unfortunately, some problems cannot be modeled with
Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning
This paper develops a stochastic optimisation algorithm that allows for scalable information maximisation and empowerment-based reasoning directly from pixels to actions on the problem of intrinsically-motivated learning.
Online Active Perception for Partially Observable Markov Decision Processes with Limited Budget
  • Mahsa Ghasemi, U. Topcu
  • Computer Science, Engineering
    2019 IEEE 58th Conference on Decision and Control (CDC)
  • 2019
This work considers a setting in which at runtime an agent is capable of gathering information under a limited budget and proposes a generalized greedy strategy that selects a subset of information sources with near-optimality guarantees on uncertainty reduction.
Deep Recurrent Q-Learning for Partially Observable MDPs
The effects of adding recurrency to a Deep Q-Network is investigated by replacing the first post-convolutional fully-connected layer with a recurrent LSTM, which successfully integrates information through time and replicates DQN's performance on standard Atari games and partially observed equivalents featuring flickering game screens.
Diversity is All You Need: Learning Skills without a Reward Function
The proposed DIAYN ("Diversity is All You Need"), a method for learning useful skills without a reward function, learns skills by maximizing an information theoretic objective using a maximum entropy policy.
Control of Memory, Active Perception, and Action in Minecraft
These tasks are designed to emphasize, in a controllable manner, issues that pose challenges for RL methods including partial observability, delayed rewards, high-dimensional visual observations, and the need to use active perception in a correct manner so as to perform well in the tasks.