# PowerPlay: Training an Increasingly General Problem Solver by Continually Searching for the Simplest Still Unsolvable Problem

@article{Schmidhuber2013PowerPlayTA, title={PowerPlay: Training an Increasingly General Problem Solver by Continually Searching for the Simplest Still Unsolvable Problem}, author={J{\"u}rgen Schmidhuber}, journal={Frontiers in Psychology}, year={2013}, volume={4} }

Most of computer science focuses on automatically solving given computational problems. I focus on automatically inventing or discovering problems in a way inspired by the playful behavior of animals and humans, to train a more and more general problem solver from scratch in an unsupervised fashion. Consider the infinite set of all computable descriptions of tasks with possibly computable solutions. Given a general problem-solving architecture, at any given time, the novel algorithmic framework…

## 116 Citations

Continually adding self-invented problems to the repertoire: First experiments with POWERPLAY

- Computer Science2012 IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL)
- 2012

Self-delimiting recurrent neural network (SLIM RNN) is used as a general computational architecture to implement the system's solver and learns to become increasingly general problem solvers, continually adding new problem solving procedures to the growing repertoire, exhibiting interesting developmental stages.

First Experiments with PowerPlay

- Computer ScienceNeural networks : the official journal of the International Neural Network Society
- 2013

ToyArchitecture: Unsupervised Learning of Interpretable Models of the World

- Computer ScienceArXiv
- 2019

This work presents a novel, purposely simple, and interpretable hierarchical architecture which combines multiple different mechanisms into one system: unsupervised learning of a model of the world, learning the influence of one’s own actions on theworld, model-based reinforcement learning, hierarchical planning and plan execution, and symbolic/sub-symbolic integration in general.

Reverse Curriculum Generation for Reinforcement Learning

- Computer ScienceCoRL
- 2017

This work proposes a method to learn goal-oriented tasks without requiring any prior knowledge other than obtaining a single state in which the task is achieved, and generates a curriculum of start states that adapts to the agent's performance, leading to efficient training on goal- oriented tasks.

Multi-task Deep Reinforcement Learning with PopArt

- Computer ScienceAAAI
- 2019

This work proposes to automatically adapt the contribution of each task to the agent’s updates, so that all tasks have a similar impact on the learning dynamics, and learns a single trained policy that exceeds median human performance on this multi-task domain.

One Big Net For Everything

- Computer ScienceArXiv
- 2018

The incremental training of an increasingly general problem solver, continually learning to solve new tasks without forgetting previous skills is applied, to greatly speed up subsequent learning of additional, novel but algorithmically related skills.

BeBold: Exploration Beyond the Boundary of Explored Regions

- Computer ScienceArXiv
- 2020

The regulated difference of inverse visitation counts is proposed as a simple but effective criterion for IR that helps the agent explore Beyond the Boundary of explored regions and mitigates common issues in count-based methods, such as short-sightedness and detachment.

Open-Ended Learning Leads to Generally Capable Agents

- Computer ScienceArXiv
- 2021

The red player’s goal is to put both the purple cube and the black cube (its own cube) onto its base (the grey floor), while the blue player tries to put them on the blue floor – the cubes are used as flags.

ToyArchitecture: Unsupervised learning of interpretable models of the environment

- Computer SciencePloS one
- 2020

This paper presents a novel, purposely simple, and interpretable hierarchical architecture that incorporates the unsupervised learning of a model of the environment, learning the influence of one’s own actions, model-based reinforcement learning, hierarchical planning, and symbolic/sub-symbolic integration in general.

C OMPETITIVE EXPERIENCE REPLAY

- Computer Science
- 2019

This work proposes a novel method called competitive experience replay, which efficiently supplements a sparse reward by placing learning in the context of an exploration competition between a pair of agents, creating a competitive game designed to drive exploration.

## References

SHOWING 1-10 OF 132 REFERENCES

Continually adding self-invented problems to the repertoire: First experiments with POWERPLAY

- Computer Science2012 IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL)
- 2012

Self-delimiting recurrent neural network (SLIM RNN) is used as a general computational architecture to implement the system's solver and learns to become increasingly general problem solvers, continually adding new problem solving procedures to the growing repertoire, exhibiting interesting developmental stages.

Optimal Ordered Problem Solver

- Computer ScienceMachine Learning
- 2004

An efficient, recursive, backtracking-based way of implementing OOPS on realistic computers with limited storage is introduced, and experiments illustrate how OOPS can greatly profit from metalearning or metasearching, that is, searching for faster search procedures.

First Experiments with PowerPlay

- Computer ScienceNeural networks : the official journal of the International Neural Network Society
- 2013

Bias-Optimal Incremental Problem Solving

- Computer ScienceNIPS
- 2002

Given is a problem sequence and a probability distribution (the bias) on programs computing solution candidates. We present an optimally fast way of incrementally solving each task in the sequence.…

Gödel Machines: Fully Self-referential Optimal Universal Self-improvers

- Computer ScienceArtificial General Intelligence
- 2007

The first class of mathematically rigorous, general, fully self-referential, self-improving, optimally efficient problem solvers is presented, which not only boasts an optimal order of complexity but can optimally reduce any slowdowns hidden by the O()-notation, provided the utility of such speed-ups is provable at all.

Ultimate Cognition à la Gödel

- Computer ScienceCognitive Computation
- 2009

An agent-controlling program that speaks about itself, ready to rewrite itself in arbitrary fashion once it has found a proof that the rewrite is useful according to a user-defined utility function is described.

Developmental robotics, optimal artificial curiosity, creativity, music, and the fine arts

- ArtConnect. Sci.
- 2006

It is pointed out how the fine arts can be formally understood as a consequence of the basic principle: given some subjective observer, great works of art and music yield observation histories exhibiting more novel, previously unknown compressibility/regularity/predictability than lesser works, thus deepening the observer’s understanding of the world and what is possible in it.

Artificial curiosity based on discovering novel algorithmic predictability through coevolution

- Computer ScienceProceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406)
- 1999

A "curious" embedded agent that differs from previous explorers in the sense that it can limit its predictions to fairly arbitrary, computable aspects of event sequences and thus can explicitly ignore almost arbitrary unpredictable, random aspects.

Exploring the predictable

- Computer Science
- 2003

This work studies an embedded active learner that can limit its predictions to almost arbitrary computable aspects of spatio-temporal events and constructs probabilistic algorithms that map event sequences to abstract internal representations (IRs), and predicts IRs from IRs computed earlier.