PowerPlay: Training an Increasingly General Problem Solver by Continually Searching for the Simplest Still Unsolvable Problem

@article{Schmidhuber2013PowerPlayTA,
  title={PowerPlay: Training an Increasingly General Problem Solver by Continually Searching for the Simplest Still Unsolvable Problem},
  author={J{\"u}rgen Schmidhuber},
  journal={Frontiers in Psychology},
  year={2013},
  volume={4}
}
  • J. Schmidhuber
  • Published 22 December 2011
  • Computer Science
  • Frontiers in Psychology
Most of computer science focuses on automatically solving given computational problems. I focus on automatically inventing or discovering problems in a way inspired by the playful behavior of animals and humans, to train a more and more general problem solver from scratch in an unsupervised fashion. Consider the infinite set of all computable descriptions of tasks with possibly computable solutions. Given a general problem-solving architecture, at any given time, the novel algorithmic framework… 
Continually adding self-invented problems to the repertoire: First experiments with POWERPLAY
TLDR
Self-delimiting recurrent neural network (SLIM RNN) is used as a general computational architecture to implement the system's solver and learns to become increasingly general problem solvers, continually adding new problem solving procedures to the growing repertoire, exhibiting interesting developmental stages.
First Experiments with PowerPlay
ToyArchitecture: Unsupervised Learning of Interpretable Models of the World
TLDR
This work presents a novel, purposely simple, and interpretable hierarchical architecture which combines multiple different mechanisms into one system: unsupervised learning of a model of the world, learning the influence of one’s own actions on theworld, model-based reinforcement learning, hierarchical planning and plan execution, and symbolic/sub-symbolic integration in general.
Reverse Curriculum Generation for Reinforcement Learning
TLDR
This work proposes a method to learn goal-oriented tasks without requiring any prior knowledge other than obtaining a single state in which the task is achieved, and generates a curriculum of start states that adapts to the agent's performance, leading to efficient training on goal- oriented tasks.
Multi-task Deep Reinforcement Learning with PopArt
TLDR
This work proposes to automatically adapt the contribution of each task to the agent’s updates, so that all tasks have a similar impact on the learning dynamics, and learns a single trained policy that exceeds median human performance on this multi-task domain.
One Big Net For Everything
TLDR
The incremental training of an increasingly general problem solver, continually learning to solve new tasks without forgetting previous skills is applied, to greatly speed up subsequent learning of additional, novel but algorithmically related skills.
BeBold: Exploration Beyond the Boundary of Explored Regions
TLDR
The regulated difference of inverse visitation counts is proposed as a simple but effective criterion for IR that helps the agent explore Beyond the Boundary of explored regions and mitigates common issues in count-based methods, such as short-sightedness and detachment.
Open-Ended Learning Leads to Generally Capable Agents
TLDR
The red player’s goal is to put both the purple cube and the black cube (its own cube) onto its base (the grey floor), while the blue player tries to put them on the blue floor – the cubes are used as flags.
ToyArchitecture: Unsupervised learning of interpretable models of the environment
TLDR
This paper presents a novel, purposely simple, and interpretable hierarchical architecture that incorporates the unsupervised learning of a model of the environment, learning the influence of one’s own actions, model-based reinforcement learning, hierarchical planning, and symbolic/sub-symbolic integration in general.
C OMPETITIVE EXPERIENCE REPLAY
TLDR
This work proposes a novel method called competitive experience replay, which efficiently supplements a sparse reward by placing learning in the context of an exploration competition between a pair of agents, creating a competitive game designed to drive exploration.
...
...

References

SHOWING 1-10 OF 132 REFERENCES
Continually adding self-invented problems to the repertoire: First experiments with POWERPLAY
TLDR
Self-delimiting recurrent neural network (SLIM RNN) is used as a general computational architecture to implement the system's solver and learns to become increasingly general problem solvers, continually adding new problem solving procedures to the growing repertoire, exhibiting interesting developmental stages.
Optimal Ordered Problem Solver
TLDR
An efficient, recursive, backtracking-based way of implementing OOPS on realistic computers with limited storage is introduced, and experiments illustrate how OOPS can greatly profit from metalearning or metasearching, that is, searching for faster search procedures.
First Experiments with PowerPlay
Bias-Optimal Incremental Problem Solving
Given is a problem sequence and a probability distribution (the bias) on programs computing solution candidates. We present an optimally fast way of incrementally solving each task in the sequence.
Gödel Machines: Fully Self-referential Optimal Universal Self-improvers
TLDR
The first class of mathematically rigorous, general, fully self-referential, self-improving, optimally efficient problem solvers is presented, which not only boasts an optimal order of complexity but can optimally reduce any slowdowns hidden by the O()-notation, provided the utility of such speed-ups is provable at all.
Ultimate Cognition à la Gödel
TLDR
An agent-controlling program that speaks about itself, ready to rewrite itself in arbitrary fashion once it has found a proof that the rewrite is useful according to a user-defined utility function is described.
Developmental robotics, optimal artificial curiosity, creativity, music, and the fine arts
TLDR
It is pointed out how the fine arts can be formally understood as a consequence of the basic principle: given some subjective observer, great works of art and music yield observation histories exhibiting more novel, previously unknown compressibility/regularity/predictability than lesser works, thus deepening the observer’s understanding of the world and what is possible in it.
Artificial curiosity based on discovering novel algorithmic predictability through coevolution
  • J. Schmidhuber
  • Computer Science
    Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406)
  • 1999
TLDR
A "curious" embedded agent that differs from previous explorers in the sense that it can limit its predictions to fairly arbitrary, computable aspects of event sequences and thus can explicitly ignore almost arbitrary unpredictable, random aspects.
Exploring the predictable
TLDR
This work studies an embedded active learner that can limit its predictions to almost arbitrary computable aspects of spatio-temporal events and constructs probabilistic algorithms that map event sequences to abstract internal representations (IRs), and predicts IRs from IRs computed earlier.
...
...