Maximizing Information Gain in Partially Observable Environments via Prediction Reward

@article{Satsangi2020MaximizingIG,
  title={Maximizing Information Gain in Partially Observable Environments via Prediction Reward},
  author={Yash Satsangi and Sungsu Lim and Shimon Whiteson and Frans A. Oliehoek and Martha White},
  journal={ArXiv},
  year={2020},
  volume={abs/2005.04912}
}
Information gathering in a partially observable environment can be formulated as a reinforcement learning (RL), problem where the reward depends on the agent's uncertainty. For example, the reward can be the negative entropy of the agent's belief over an unknown (or hidden) variable. Typically, the rewards of an RL agent are defined as a function of the state-action pairs and not as a function of the belief of the agent; this hinders the direct application of deep RL methods for such tasks… Expand
Adapting Behaviour via Intrinsic Reward: A Survey and Empirical Study
TLDR
The interaction between reward and prediction learners is discussed and the importance of introspective prediction learners are highlighted: those that increase their rate of learning when progress is possible, and decrease when it is not. Expand
Adapting Behavior via Intrinsic Reward: A Survey and Empirical Study
TLDR
The interaction between reward and prediction learners is discussed and the importance of introspective prediction learners are highlighted: those that increase their rate of learning when progress is possible, and decrease when it is not. Expand
Multi-agent active perception with prediction rewards
TLDR
This paper models multi-agent active perception as a decentralized partially observable Markov decision process (Dec-POMDP) with a convex centralized prediction reward and proves that by introducing individual prediction actions for each agent, the problem is converted into a standard Dec-PomDP with a decentralized prediction reward. Expand
Reinforcement Learning with Efficient Active Feature Acquisition
TLDR
A model-based reinforcement learning framework that learns an active feature acquisition policy to solve the exploration-exploitation problem during its execution that outperforms conventional baselines and results in policies with greater cost efficiency. Expand
Adapting Behavior via Intrinsic Reward
Learning about many things can provide numerous benefits to a reinforcement learning system. How to behave to best learn a collection of predictions in parallel in the reinforcement learning settingExpand
RECON: Rapid Exploration for Open-World Navigation with Latent Goal Models
TLDR
A robotic learning system for autonomous navigation in diverse environments with a non-parametric map that reflects the connectivity of the environment but does not require geometric reconstruction or localization, and a latent variable model of distances and actions for efficiently constructing and traversing this map. Expand

References

SHOWING 1-10 OF 54 REFERENCES
Decision-theoretic planning under uncertainty with information rewards for active cooperative perception
TLDR
This work presents the POMDP with Information Rewards (POMDP-IR) modeling framework, which rewards an agent for reaching a certain level of belief regarding a state feature, and demonstrates their use for active cooperative perception scenarios. Expand
Reinforcement Learning with Unsupervised Auxiliary Tasks
TLDR
This paper significantly outperforms the previous state-of-the-art on Atari, averaging 880\% expert human performance, and a challenging suite of first-person, three-dimensional \emph{Labyrinth} tasks leading to a mean speedup in learning of 10$\times$ and averaging 87\% Expert human performance on Labyrinth. Expand
Curiosity-Driven Exploration by Self-Supervised Prediction
TLDR
This work forms curiosity as the error in an agent's ability to predict the consequence of its own actions in a visual feature space learned by a self-supervised inverse dynamics model, which scales to high-dimensional continuous state spaces like images, bypasses the difficulties of directly predicting pixels, and ignores the aspects of the environment that cannot affect the agent. Expand
Deep Variational Reinforcement Learning for POMDPs
TLDR
Deep variational reinforcement learning (DVRL) is proposed, which introduces an inductive bias that allows an agent to learn a generative model of the environment and perform inference in that model to effectively aggregate the available information. Expand
A POMDP Extension with Belief-dependent Rewards
Partially Observable Markov Decision Processes (POMDPs) model sequential decision-making problems under uncertainty and partial observability. Unfortunately, some problems cannot be modeled withExpand
Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning
TLDR
This paper develops a stochastic optimisation algorithm that allows for scalable information maximisation and empowerment-based reasoning directly from pixels to actions on the problem of intrinsically-motivated learning. Expand
Online Active Perception for Partially Observable Markov Decision Processes with Limited Budget
  • Mahsa Ghasemi, U. Topcu
  • Computer Science, Engineering
  • 2019 IEEE 58th Conference on Decision and Control (CDC)
  • 2019
TLDR
This work considers a setting in which at runtime an agent is capable of gathering information under a limited budget and proposes a generalized greedy strategy that selects a subset of information sources with near-optimality guarantees on uncertainty reduction. Expand
Deep Recurrent Q-Learning for Partially Observable MDPs
TLDR
The effects of adding recurrency to a Deep Q-Network is investigated by replacing the first post-convolutional fully-connected layer with a recurrent LSTM, which successfully integrates information through time and replicates DQN's performance on standard Atari games and partially observed equivalents featuring flickering game screens. Expand
Diversity is All You Need: Learning Skills without a Reward Function
TLDR
The proposed DIAYN ("Diversity is All You Need"), a method for learning useful skills without a reward function, learns skills by maximizing an information theoretic objective using a maximum entropy policy. Expand
Control of Memory, Active Perception, and Action in Minecraft
TLDR
These tasks are designed to emphasize, in a controllable manner, issues that pose challenges for RL methods including partial observability, delayed rewards, high-dimensional visual observations, and the need to use active perception in a correct manner so as to perform well in the tasks. Expand
...
1
2
3
4
5
...