• Corpus ID: 57573844

Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions

  title={Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions},
  author={Rui Wang and Joel Lehman and Jeff Clune and Kenneth O. Stanley},
While the history of machine learning so far largely encompasses a series of problems posed by researchers and algorithms that learn their solutions, an important question is whether the problems themselves can be generated by the algorithm at the same time as they are being solved. [] Key Result Our results show that POET produces a diverse range of sophisticated behaviors that solve a wide range of environmental challenges, many of which cannot be solved by direct optimization alone, or even through a…

Figures and Tables from this paper

Enhanced POET: Open-Ended Reinforcement Learning through Unbounded Invention of Learning Challenges and their Solutions

This work introduces and empirically validate two new innovations to the original POET algorithm, as well as two external innovations designed to help elucidate its full potential, which enable the most open-ended algorithmic demonstration to date.

Increasing generality in machine learning through procedural content generation

Existing work on PCG, its overlap with current efforts in ML, and promising new research directions such as procedurally generated learning environments are reviewed, including how PCG may be crucial for training agents which generalise well.

It Takes Four to Tango: Multiagent Selfplay for Automatic Curriculum Generation

Curriculum Self Play (CuSP), an automated goal generation framework that seeks to satisfy desiderata by virtue of a multi-player game with 4 agents, is proposed and succeeds at generating an effective curricula of goals for a range of control tasks.

Open-Ended Learning Strategies for Learning Complex Locomotion Skills

This work adapts the Enhanced Paired Open-Ended Trailblazer (ePOET) approach to train more complex agents to walk efficiently on complex three-dimensional terrains, and combines ePOET with Soft Actor-Critic off-policy optimization, yielding ePOet-SAC to ensure that the agent could learn more diverse skills to solve more challenging tasks.

Evolving Curricula with Regret-Based Environment Design

This work proposes harnessing the power of evolution in a principled, regret-based curriculum that seeks to constantly produce levels at the frontier of an agent’s capabilities, resulting in curricula that start simple but become increasingly complex.

Trying AGAIN instead of Trying Longer: Prior Learning for Automatic Curriculum Learning

A two stage ACL approach is proposed where a teacher algorithm first learns to train a DRL agent with a high-exploration curriculum, and then distills learned priors from the first run to generate an "expert curriculum" to re-train the same agent from scratch.

MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge

This work introduces M INE D OJO, a new framework built on the popular Minecraft game that features a simulation suite with thousands of diverse open-ended tasks and an internet-scale knowledge base with Minecraft videos, tutorials, wiki pages, and forum discussions and proposes a novel agent learning algorithm that leverages large pre-trained video-language models as a learned reward function.

Meta Automatic Curriculum Learning

This work presents AGAIN, a first instantiation of Meta-ACL, and showcases its benefits for curriculum generation over classical ACL in multiple simulated environments including procedurally generated parkour environments with learners of varying morphologies.

Meta-learning curiosity algorithms

This work proposes a strategy for encoding curiosity algorithms as programs in a domain-specific language and searching, during a meta-learning phase, for algorithms that enable RL agents to perform well in new domains.

OPEn: An Open-ended Physics Environment for Learning Without a Task

This paper builds a benchmark Open-ended Physics Environment (OPEn) and designs several tasks to test learning representations in this environment explicitly and test several existing RL-based exploration methods on this benchmark, finding that an agent using unsupervised contrastive learning for representation learning, and impact-driven learning for exploration, achieved the best results.



Exploiting Open-Endedness to Solve Problems Through the Search for Novelty

Decoupling the idea of open-ended search from only artificial life worlds, the raw search for novelty can be applied to real world problems and significantly outperforms objective-based search in the deceptive maze navigation task.

PowerPlay: Training an Increasingly General Problem Solver by Continually Searching for the Simplest Still Unsolvable Problem

This work focuses on automatically inventing or discovering problems in a way inspired by the playful behavior of animals and humans, to train a more and more general problem solver from scratch in an unsupervised fashion.

Challenges in coevolutionary learning: arms-race dynamics, open-endedness, and medicocre stable states

The results show that subtle changes to the game determine whether it is open-ended, and profoundly the existence and nature of an arms race.

Minimal criterion coevolution: a new approach to open-ended search

This paper investigates the extent to which interactions between two coevolving populations, both subject to their own constraint, or minimal criterion, can produce results that are both functional and diverse even without any behavior characterization or novelty archive.

Episodic Curiosity through Reachability

A new curiosity method which uses episodic memory to form the novelty bonus, based on how many environment steps it takes to reach the current observation from those in memory - which incorporates rich information about environment dynamics.

Teacher–Student Curriculum Learning

We propose Teacher–Student Curriculum Learning (TSCL), a framework for automatic curriculum learning, where the Student tries to learn a complex task, and the Teacher automatically chooses subtasks

A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play

This paper generalizes the AlphaZero approach into a single AlphaZero algorithm that can achieve superhuman performance in many challenging games, and convincingly defeated a world champion program in the games of chess and shogi (Japanese chess), as well as Go.

The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract)

The promise of ALE is illustrated by developing and benchmarking domain-independent agents designed using well-established AI techniques for both reinforcement learning and planning, and an evaluation methodology made possible by ALE is proposed.

Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning

It is illustrated the computational efficiency of IMGEPs as these robotic experiments use a simple memory-based low-level policy representations and search algorithm, enabling the whole system to learn online and incrementally on a Raspberry Pi 3.

Mastering the game of Go without human knowledge

An algorithm based solely on reinforcement learning is introduced, without human data, guidance or domain knowledge beyond game rules, that achieves superhuman performance, winning 100–0 against the previously published, champion-defeating AlphaGo.