• Corpus ID: 235428613

Meta Reinforcement Learning for Heuristic Planing

@inproceedings{Gutierrez2021MetaRL,
  title={Meta Reinforcement Learning for Heuristic Planing},
  author={Ricardo Luna Gutierrez and Matteo Leonetti},
  booktitle={ICAPS},
  year={2021}
}
Heuristic planning has a central role in classical planning ap- plications and competitions. Thanks to this success, there has been an increasing interest in using Deep Learning to cre- ate high-quality heuristics in a supervised fashion, learning from optimal solutions of previously solved planning prob- lems. Meta-Reinforcement learning is a fast growing research area concerned with learning, from many tasks, behaviours that can quickly generalize to new tasks from the same distribution of… 
1 Citations

Figures and Tables from this paper

Reinforcement Learning for Classical Planning: Viewing Heuristics as Dense Reward Generators
TLDR
This paper proposes to leverage domain-independent heuristic functions commonly used in the classical planning literature to improve the sample efficiency of RL, and demonstrates on several classical planning domains that using classical heuristics for RL allows for good sample efficiency compared to sparse-reward RL.

References

SHOWING 1-10 OF 39 REFERENCES
Learning Generalized Reactive Policies using Deep Neural Networks
TLDR
This work shows that a deep neural network can be used to learn and represent a generalized reactive policy (GRP) that maps a problem instance and a state to an action, and that the learned GRPs efficiently solve large classes of challenging problem instances.
Learning to Rank for Synthesizing Planning Heuristics
TLDR
This work investigates learning heuristics for domain-specific planning and frames learning a heuristic as a learning to rank problem which is solved using a RankSVM formulation and introduces new methods for computing features that capture temporal interactions in an approximate plan.
Deep Learning for Cost-Optimal Planning: Task-Dependent Planner Selection
TLDR
This work suggests representing planning tasks by images, allowing to exploit arguably one of the most commonly used and best developed techniques in deep learning, and presents various ways of building practically useful online portfoliobased planners.
Guiding Search with Generalized Policies for Probabilistic Planning
TLDR
By combining ASNet with MCTS, this work is able to improve the capability of an ASNet to generalize beyond the distribution of problems it was trained on, as well as enhance the navigation of the search space by MCTs.
Delfi: Online Planner Selection for Cost-Optimal Planning
TLDR
This planner abstract describes the techniques used to create the Delfi, an online portfolio planner submitted to optimal classical track of the International Planning Competition (IPC) 2018, and a collection of cost-optimal planners based on Fast Downward.
Training Deep Reactive Policies for Probabilistic Planning Problems
TLDR
The results show that effective deep reactive policies can be learned for many benchmark problems and that leveraging the planning problem description to define the network structure can be beneficial.
Towards learning domain-independent planning heuristics
TLDR
This work explores the possibility of obtaining domain-independent heuristic functions using machine learning to improve practical applicability of planning in systems for which the planning domains evolve at run time.
Learning Control Knowledge for Forward Search Planning
TLDR
This work introduces a novel feature space for representing control knowledge in terms of information computed via relaxed plan extraction, which has been a major source of success for non-learning planners and gives a new way of leveraging relaxed planning techniques in the context of learning.
Learning Domain-Independent Planning Heuristics with Hypergraph Networks
TLDR
This work presents the first approach capable of learning domain-independent planning heuristics entirely from scratch, and shows that the heuristically learned are able to generalise across different problems and domains, including to domains that were not seen during training.
Learning heuristic functions for large state spaces
...
1
2
3
4
...