• Corpus ID: 236447747

Human-Level Reinforcement Learning through Theory-Based Modeling, Exploration, and Planning

  title={Human-Level Reinforcement Learning through Theory-Based Modeling, Exploration, and Planning},
  author={Pedro Tsividis and Jo{\~a}o Loula and Jake Burga and Nathan Foss and Andres Campero and Thomas Pouncy and Samuel J. Gershman and Joshua B. Tenenbaum},
Reinforcement learning (RL) studies how an agent comes to achieve reward in an environment through interactions over time. Recent advances in machine RL have surpassed human expertise at the world’s oldest board games and many classic video games, but they require vast quantities of experience to learn successfully — none of today’s algorithms account for the human ability to learn so many different tasks, so quickly. Here we propose a new approach to this challenge based on a particularly… 

Figures from this paper

Alchemy: A benchmark and analysis toolkit for meta-reinforcement learning agents

This work evaluates a pair of powerful RL agents on Alchemy and presents an in-depth analysis of one of these agents, providing validation for Alchemy as a challenging benchmark for meta-RL research.

Curious Exploration via Structured World Models Yields Zero-Shot Object Manipulation

This paper proposes to use structured world models to incorporate relational inductive biases in the control loop to achieve sample-efficient and interaction-rich exploration in compositional multi-object environments and showcases that the self-reinforcing cycle between good models and good exploration also opens up another avenue: zero-shot generalization to downstream tasks via model-based planning.

Learning Relational Rules from Rewards

This paper builds a simple model of relational policy learning based on a function approximator developed in RRL, and trains and tests the model in three Atari games that required to consider an increasingly number of potential relations.

Disentangling Abstraction from Statistical Pattern Matching in Human and Machine Learning

This work defines a novel methodology for building “task metamers” that closely match the statistics of the abstract tasks but use a different underlying generative process, and evaluates performance on both abstract and metamer tasks.

Target Languages (vs. Inductive Biases) for Learning to Act and Plan

A different learning approach where representations do not emerge from biases in a neural architecture but are learned over a given target language with a known semantics, to make these ideas explicit and to illustrate them in the context of learning to act and plan.

Learning Latent Traits for Simulated Cooperative Driving Tasks

This paper builds a framework capable of capturing a compact latent representation of the human in terms of their behavior and preferences based on data from a simulated population of drivers, using a lightweight simulation environment for modelling one form of distracted driving behavior.

HMIway-env: A Framework for Simulating Behaviors and Preferences to Support Human-AI Teaming in Driving

A lightweight simulation and modeling framework, HMIway-env, for studying human-machine teaming in the context of driving is introduced, incorporating models for distracted and cautious driving, and early experimental results toward the training of better intervention policies are shown.

Growing knowledge culturally across generations to solve novel, complex tasks

It is suggested that language provides a sufficient medium to express and accumulate the knowledge people acquire in these diverse tasks: the dynamics of the environment, valuable goals, dangerous risks, and strategies for success.

Replay and compositional computation

A speculative hypothesis is proposed: that ‘replay’ in the brain implements a form of compositional computation where entities are assembled into meaningful structures.

The Neural Architecture of Theory-based Reinforcement Learning

A theory-based reinforcement learning model was used to analyze brain data from human participants learning to play different Atari-style video games while undergoing functional MRI, showing consistent results with a neural architecture in which top-down theory representations originating in prefrontal regions shape sensory predictions in visual areas.



Model-Based Reinforcement Learning for Atari

Simulated Policy Learning (SimPLe), a complete model-based deep RL algorithm based on video prediction models, is described and a comparison of several model architectures is presented, including a novel architecture that yields the best results in the authors' setting.

Human-level control through deep reinforcement learning

This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.

Learning to Play with Intrinsically-Motivated Self-Aware Agents

This work proposes a "world-model" network that learns to predict the dynamic consequences of the agent's actions, and demonstrates that this policy causes the agent to explore novel and informative interactions with its environment, leading to the generation of a spectrum of complex behaviors.

Strategic Object Oriented Reinforcement Learning

This work introduces strategic object oriented reinforcement learning (SOORL) to learn simple dynamics model through automatic model selection and perform efficient planning with strategic exploration in a model-based setting in which exact planning is impossible.

Relational Deep Reinforcement Learning

We introduce an approach for deep reinforcement learning (RL) that improves upon the efficiency, generalization capacity, and interpretability of conventional approaches through structured perception

Human-level performance in first-person multiplayer games with population-based deep reinforcement learning

It is demonstrated for the first time that an agent can achieve human-level in a popular 3D multiplayer first-person video game, Quake III Arena Capture the Flag, using only pixels and game points as input.

Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning

The central idea is to use the slow planning-based agents to provide training data for a deep-learning architecture capable of real-time play, and proposed new agents based on this idea are proposed and shown to outperform DQN.

Causal Reasoning from Meta-reinforcement Learning

It is suggested that causal reasoning in complex settings may benefit from the more end-to-end learning-based approaches presented here, and this work offers new strategies for structured exploration in reinforcement learning, by providing agents with the ability to perform -- and interpret -- experiments.

Rapid trial-and-error learning with simulation supports flexible tool use and physical reasoning

This work proposes that the flexibility of human physical problem solving rests on an ability to imagine the effects of hypothesized actions, while the efficiency of human search arises from rich action priors which are updated via observations of the world.

Learning Visual Predictive Models of Physics for Playing Billiards

This paper explores how an agent can be equipped with an internal model of the dynamics of the external world, and how it can use this model to plan novel actions by running multiple internal simulations ("visual imagination").