Solving Hard AI Planning Instances Using Curriculum-Driven Deep Reinforcement Learning

  title={Solving Hard AI Planning Instances Using Curriculum-Driven Deep Reinforcement Learning},
  author={Dieqiao Feng and Carla P. Gomes and Bart Selman},
  booktitle={International Joint Conference on Artificial Intelligence},
Despite significant progress in general AI planning, certain domains remain out of reach of current AI planning systems. Sokoban is a PSPACE-complete planning task and represents one of the hardest domains for current AI planners. Even domain-specific specialized search methods fail quickly due to the exponential search complexity on hard instances. Our approach based on deep reinforcement learning augmented with a curriculum-driven method is the first one to solve hard instances within one day… 

Figures and Tables from this paper

A Novel Automated Curriculum Strategy to Solve Hard Sokoban Planning Instances

This work presents a novel automated curriculum approach that dynamically selects from a pool of unlabeled training instances of varying task complexity guided by the authors' difficulty quantum momentum strategy and shows how the smoothness of the task hardness impacts the final learning results.

Graph Value Iteration

This work proposes a domain-independent method that augments graph search with graph value iteration to solve hard planning instances that are out of reach for domain-specialized solvers and shows how a curriculum strategy is used to smooth the learning process.

Left Heavy Tails and the Effectiveness of the Policy and Value Networks in DNN-based best-first search for Sokoban Planning

The experiments show the critical role of the policy network as a powerful heuristic guiding the search, which can lead to left heavy tails with polynomial scaling by avoiding exploring exponentially sized subtrees.

Variational Automatic Curriculum Learning for Sparse-Reward Cooperative Multi-Agent Problems

A curriculum learning algorithm, Variational Automatic Curriculum Learning (VACL), for solving challenging goal-conditioned cooperative multiagent reinforcement learning problems and reproduces the ramp-use behavior originally shown in OpenAI’s hide-and-seek project.

Solving Sokoban with forward-backward reinforcement learning

This work first train a backward-looking agent with a simple relaxed goal, and then augment the state representation of the forward-lookingAgent with straightforward hint features, which allows the learned forward agent to leverage information from backward plans, without mimicking their policy.

Model-Based Deep Reinforcement Learning for High-Dimensional Problems, a Survey

A taxonomy based on three approaches: using explicit planning on given transitions, using explicit plans on learned transitions, and end-to-end learning of both planning and transitions is proposed.

Solving Sokoban with backward reinforcement learning

This work first trains a backward-looking agent with a simple relaxed goal, then augment the state representation of the puzzle with straightforward hint features that are extracted from the behavior of that agent, and trains a forward looking agent with this informed augmented state.

CLR-DRNets: Curriculum Learning with Restarts to Solve Visual Combinatorial Games

This work proposes CLR-DRNets, a curriculum-learning-with-restarts framework to boost the performance of Deep Reasoning Nets and proposes an enhanced reasoning module for the DRNets framework for encoding these visual games.

Reinforcement learning algorithms for the Untangling of Braids

Reinforcement learning algorithms (Q-Learning and Deep Q-Learning) are used to tackle the problem of untangling braids and the results provide evidencethat the more the system is trained, the better the untangling player gets for both approaches at untangling brasids.

Untangling Braids with Multi-Agent Q-Learning

The results provide evidence that the more the system is trained, the better the untangling player gets at untangling braids, and at the same time, the tangling player produces good examples of tangled braids.



Learning Generalized Reactive Policies using Deep Neural Networks

This work shows that a deep neural network can be used to learn and represent a generalized reactive policy (GRP) that maps a problem instance and a state to an action, and that the learned GRPs efficiently solve large classes of challenging problem instances.

Structure and inference in classical planning

It is shown that many of the standard benchmark domains can be solved with almost no search or a polynomially bounded amount of search, once the structure of planning problems is taken into account.

Goal-Based Action Priors

This work develops a framework for goal and state dependent action priors that can be used to prune away irrelevant actions based on the robot’s current goal, thereby greatly accelerating planning in a variety of complex stochastic environments.

Imagination-Augmented Agents for Deep Reinforcement Learning

Imagination-Augmented Agents (I2As), a novel architecture for deep reinforcement learning combining model-free and model-based aspects, shows improved data efficiency, performance, and robustness to model misspecification compared to several baselines.

Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

This paper generalises the approach into a single AlphaZero algorithm that can achieve, tabula rasa, superhuman performance in many challenging domains, and convincingly defeated a world-champion program in each case.

What good are actions? Accelerating learning using learned action priors

This work extends its method to base action priors on perceptual cues rather than absolute states, allowing the transfer of these priors between tasks with differing state spaces and transition functions, and demonstrates experimentally the advantages of learning withaction priors in a reinforcement learning context.

The first learning track of the international planning competition

The competition results show that at this stage no learning for planning system outperforms state-of-the-art planners in a domain independent manner across a wide range of domains, but systems appear to be close to providing such performance.

GRASP: A Search Algorithm for Propositional Satisfiability

Experimental results obtained from a large number of benchmarks indicate that application of the proposed conflict analysis techniques to SAT algorithms can be extremely effective for aLarge number of representative classes of SAT instances.

Playing Atari with Deep Reinforcement Learning

This work presents the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning, which outperforms all previous approaches on six of the games and surpasses a human expert on three of them.

PDDL2.1: An Extension to PDDL for Expressing Temporal Planning Domains

The syntax of the language, PDDL2.1, is described, which has considerable modelling power -- exceeding the capabilities of current planning technology -- and presents a number of important challenges to the research community.