• Corpus ID: 67787705

World Discovery Models

@article{Azar2019WorldDM,
  title={World Discovery Models},
  author={Mohammad Gheshlaghi Azar and Bilal Piot and Bernardo {\'A}vila Pires and Jean-Bastien Grill and Florent Altch{\'e} and R{\'e}mi Munos},
  journal={ArXiv},
  year={2019},
  volume={abs/1902.07685}
}
As humans we are driven by a strong desire for seeking novelty in our world. Also upon observing a novel pattern we are capable of refining our understanding of the world based on the new information---humans can discover their world. The outstanding ability of the human mind for discovery has led to many breakthroughs in science, art and technology. Here we investigate the possibility of building an agent capable of discovering its world using the modern AI technology. In particular we… 
Information is Power: Intrinsic Control via Information Capture
TLDR
It is argued that a compact and general learning objective is to minimize the entropy of the agent’s state visitation estimated using a latent state-space model, and this agent learns to discover, represent, and exercise control of dynamic objects in a variety of partiallyobserved environments sensed with visual observations without extrinsic reward.
Probing Emergent Semantics in Predictive Agents via Question Answering
TLDR
By revealing the implicit knowledge of objects, quantities, properties and relations acquired by agents as they learn, question-conditional agent probing can stimulate the design and development of stronger predictive learning objectives.
BYOL-Explore: Exploration by Bootstrapped Prediction
TLDR
It is shown that BYOL-Explore is effective in DM-HARD-8, a challenging partially-observable continuous-action hard-exploration benchmark with visually-rich 3-D environments and achieves superhuman performance on the ten hardest exploration games in Atari while having a much simpler design than other competitive agents.
Action and Perception as Divergence Minimization
TLDR
A unified objective for action and perception of intelligent agents is introduced, and interpreting the target distribution as a latent variable model suggests powerful world models as a path toward highly adaptive agents that seek large niches in their environments, rendering task rewards optional.
Effective, interpretable algorithms for curiosity automatically discovered by evolutionary search
TLDR
Two novel curiosity algorithms are found that perform on par or better than human-designed published curiosity algorithms in domains as disparate as grid navigation with image input, acrobot, lunar lander,MuJoCo ant and MuJoCo hopper.
Meta-learning curiosity algorithms
TLDR
This work proposes a strategy for encoding curiosity algorithms as programs in a domain-specific language and searching, during a meta-learning phase, for algorithms that enable RL agents to perform well in new domains.
Learning World Graphs to Accelerate Hierarchical Reinforcement Learning
TLDR
A thorough ablation study is performed to evaluate the proposed graph abstraction over the environment structure to accelerate the learning of these tasks with significant advantages from the proposed framework over baselines that lack world graph knowledge in terms of performance and efficiency.
Improved Sample Complexity for Incremental Autonomous Exploration in MDPs
TLDR
A novel model-based approach that interleaves discovering new states from s0 and improving the accuracy of a model estimate that is used to compute goal-conditioned policies is introduced and is the first algorithm that can return an "/cmin-optimal policy for any cost-sensitive shortest-path problem defined on the L-reachable states with minimum cost cmin.
MaxEnt Reward Expected Reward Latent Representations Missing Data Controllable Future Factorized Target Perception Action Both Low Entropy Preferences Empowerment Skill Discovery Amortized Inference Maximum Likelihood Variational Inference Input Density Exploration Information GainFiltering Latent S
  • Computer Science
  • 2020
TLDR
A unified objective for action and perception of intelligent agents is introduced, and interpreting the target distribution as a latent variable model suggests powerful world models as a path toward highly adaptive agents that seek large niches in their environments, rendering task rewards optional.
Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning
Learning a good representation is an essential component for deep reinforcement learning (RL). Representation learning is especially important in multitask and partially observable settings where
...
...

References

SHOWING 1-10 OF 68 REFERENCES
Neural Predictive Belief Representations
TLDR
It is shown that for CPC multi-step predictions and action-conditioning are critical for accurate belief representations in visually complex environments, and all three methods are able to learn belief representations of the environment.
Learning and exploration in action-perception loops
TLDR
How this work elucidates the explorative behaviors of animals and humans, its relationship to other computational models of behavior, and its potential application to experimental design are discussed, such as in closed-loop neurophysiology studies.
Learning to Play with Intrinsically-Motivated Self-Aware Agents
TLDR
This work proposes a "world-model" network that learns to predict the dynamic consequences of the agent's actions, and demonstrates that this policy causes the agent to explore novel and informative interactions with its environment, leading to the generation of a spectrum of complex behaviors.
Curiosity-Driven Exploration by Self-Supervised Prediction
TLDR
This work forms curiosity as the error in an agent's ability to predict the consequence of its own actions in a visual feature space learned by a self-supervised inverse dynamics model, which scales to high-dimensional continuous state spaces like images, bypasses the difficulties of directly predicting pixels, and ignores the aspects of the environment that cannot affect the agent.
World Models
TLDR
This work explores building generative neural network models of popular reinforcement learning environments by using features extracted from the world model as inputs to an agent, and can train a very compact and simple policy that can solve the required task.
Learning Awareness Models
TLDR
The setting of an agent with a fixed body interacting with an unknown and uncertain external world is considered and it is demonstrated that even when the body is no longer in contact with an object, the latent variables of the dynamics model continue to represent its shape.
Surprise-Based Intrinsic Motivation for Deep Reinforcement Learning
TLDR
This work proposes to learn a model of the MDP transition probabilities concurrently with the policy, and to form intrinsic rewards that approximate the KL-divergence of the true transition probabilities from the learned model, which results in using surprisal as intrinsic motivation.
A nice surprise? Predictive processing and the active pursuit of novelty
Recent work in cognitive and computational neuroscience depicts human brains as devices that minimize prediction error signals: signals that encode the difference between actual and expected sensory
Observe and Look Further: Achieving Consistent Performance on Atari
TLDR
This paper proposes an algorithm that addresses three key challenges that any algorithm needs to master in order to perform well on all games: processing diverse reward distributions, reasoning over long time horizons, and exploring efficiently.
Bayesian surprise attracts human attention
...
...