• Corpus ID: 220264676

Self-Paced Context Evaluation for Contextual Reinforcement Learning

  title={Self-Paced Context Evaluation for Contextual Reinforcement Learning},
  author={Theresa Eimer and Andr{\'e} Biedenkapp and Frank Hutter and Marius Thomas Lindauer},
Reinforcement learning (RL) has made a lot of advances for solving a single problem in a given environment; but learning policies that generalize to unseen variations of a problem remains challenging. To improve sample efficiency for learning on such instances of a problem domain, we present Self-Paced Context Evaluation (SPACE). Based on self-paced learning, SPACE automatically generates instance curricula online with little computational overhead. To this end, SPACE leverages information… 

Figures from this paper

CARL: A Benchmark for Contextual and Adaptive Reinforcement Learning
CARL is proposed, a collection of well-known RL environments extended to contextual RL problems to study generalization and allows first evidence that disentangling representation learning of the states from the policy learning with the context facilitates better generalization.
Contextualize Me -- The Case for Context in Reinforcement Learning
This work shows that theoretically optimal behavior in contextual Markov Decision Processes requires explicit context information, and introduces the first benchmark library designed for generalization based on cRL extensions of popular benchmarks, CARL.
Automated Reinforcement Learning (AutoRL): A Survey and Open Problems
This survey seeks to unify the field of AutoRL, provide a common taxonomy, discuss each area in detail and pose open problems of interest to researchers going forward.
Evolving Curricula with Regret-Based Environment Design
This paper proposes to harness the power of evolution in a principled, regret-based curriculum, which seeks to constantly produce levels at the frontier of an agent’s capabilities, resulting in curricula that start simple but become increasingly complex.
Curriculum Reinforcement Learning via Constrained Optimal Transport
This work frames the generation of a curriculum as a constrained optimal transport problem between task distributions, and shows that this way of curriculum generation can improve upon existing CRL methods, yielding high performance in a variety of tasks with different characteristics.
Learning Domain-Independent Policies for Open List Selection
This paper shows how to train a reinforcement learning agent over several heterogeneous environments, aiming at zero-shot generalization to new related domains and analysis of different policies shows that prioritizing states reached via preferred operators is crucial, explaining the strong performance of LAMA.
Automated Dynamic Algorithm Configuration
The first comprehensive account of this new field of automated dynamic algorithm configuration (DAC) is given, a series of recent advances are presented, and a solid foundation for future research in this field is provided.


Self-Paced Contextual Reinforcement Learning
Empirical evaluation shows that the proposed curriculum learning scheme drastically improves sample efficiency and enables learning in scenarios with both broad and sharp target context distributions in which classical approaches perform sub-optimally.
Automatic Curriculum Learning through Value Disagreement
This work introduces a goal proposal module that prioritizes goals that maximize the epistemic uncertainty of the Q-function of the policy, and samples goals that are neither too hard nor too easy for the agent to solve, hence enabling continual improvement.
Reverse Curriculum Generation for Reinforcement Learning
This work proposes a method to learn goal-oriented tasks without requiring any prior knowledge other than obtaining a single state in which the task is achieved, and generates a curriculum of start states that adapts to the agent's performance, leading to efficient training on goal- oriented tasks.
D4RL: Datasets for Deep Data-Driven Reinforcement Learning
This work introduces benchmarks specifically designed for the offline setting, guided by key properties of datasets relevant to real-world applications of offline RL, and releases benchmark tasks and datasets with a comprehensive evaluation of existing algorithms and an evaluation protocol together with an open-source codebase.
Automatic Goal Generation for Reinforcement Learning Agents
This work uses a generator network to propose tasks for the agent to try to achieve, specified as goal states, and shows that, by using this framework, an agent can efficiently and automatically learn to perform a wide set of tasks without requiring any prior knowledge of its environment.
Trajectory-wise Multiple Choice Learning for Dynamics Generalization in Reinforcement Learning
A new model-based RL algorithm, coined trajectory-wise multiple choice learning, that learns a multi-headed dynamics model for dynamics generalization and incorporates context learning, which encodes dynamics-specific information from past experiences into the context latent vector, enabling the model to perform online adaptation to unseen environments.
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
We propose an algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning
Building Self-Play Curricula Online by Playing with Expert Agents in Adversarial Games
Empirical evaluation indicates that SEPLEM, by iteratively building a Curriculum of simulated tasks, achieves better performance than both only playing against the expert and using pure Self-Play techniques to accelerate learning in multiagent adversarial tasks.
Curriculum learning
It is hypothesized that curriculum learning has both an effect on the speed of convergence of the training process to a minimum and on the quality of the local minima obtained: curriculum learning can be seen as a particular form of continuation method (a general strategy for global optimization of non-convex functions).
Hindsight Experience Replay
A novel technique is presented which allows sample-efficient learning from rewards which are sparse and binary and therefore avoid the need for complicated reward engineering and may be seen as a form of implicit curriculum.