• Corpus ID: 5538688

Learning to Achieve Goals

@inproceedings{Kaelbling1993LearningTA,
  title={Learning to Achieve Goals},
  author={Leslie Pack Kaelbling},
  booktitle={IJCAI},
  year={1993}
}
Temporal diierence methods solve the temporal credit assignment problem for reinforcement learning. [...] Key Method In addition, this paper shows how traditional relaxation techniques can be applied to the problem. Finally, experimental results are given that demonstrate the superiority of DG learning over Q learning in a moderately large, synthetic, non-deterministic domain.Expand
Reducing the Time Complexity of Goal-Independent Reinforcement Learning
TLDR
This paper presents a technique for reducing the update complexity of CQL to O(|A|) with little impact on performance.
Meta-Reinforcement Learning Robust to Distributional Shift via Model Identification and Experience Relabeling
TLDR
This work presents model identification and experience relabeling (MIER), a meta-reinforcement learning algorithm that is both efficient and extrapolates well when faced with out-of-distribution tasks at test time.
Reducing Commitment to Tasks with Off-Policy Hierarchical Reinforcement Learning
TLDR
Modifications to several TD methods that prevent unintentional on-policy learning from occurring are presented, demonstrating that the HRL system is efficient without commitment to completion of subtasks in a cliff-walking domain, contrary to a widespread claim in the literature that it is critical for efficiency of learning.
Learning and Exploiting Multiple Subgoals for Fast Exploration in Hierarchical Reinforcement Learning
  • Libo Xing
  • Computer Science, Mathematics
    ArXiv
  • 2019
TLDR
A multi-goal HRL algorithm, consisting of a high-level policy Manager and a low- level policy Worker, that achieves the same performance as state-of-the-art HRL methods with substantially reduced training time cost is devised.
LEARNING GOAL-CONDITIONED VALUE FUNCTIONS
  • 2018
Multi-goal reinforcement learning (MGRL) addresses tasks where the desired goal state can change for every trial. State-of-the-art algorithms model these problems such that the reward formulation
Skew-Explore: Learn faster in continuous spaces with sparse rewards
TLDR
The main contribution of this work is to introduce a novel reward function which combined with a goal proposing scheme, increases the entropy of the visited states faster compared to the prior work, which improves the exploration capability of the agent, and therefore enhances the agent's chance to solve sparse reward problems more efficiently.
Training Agents using Upside-Down Reinforcement Learning
TLDR
Here the first concrete implementation of UDRL is presented and Experimental results show that its performance can be surprisingly competitive with, and even exceed that of traditional baseline algorithms developed over decades of research.
NON-PARAMETRIC DISCRIMINATIVE REWARDS
Learning to control an environment without hand-crafted rewards or expert data remains challenging and is at the frontier of reinforcement learning research. We present an unsupervised learning
Evolutionary Stochastic Policy Distillation
TLDR
This work proposes a new formulation of GCRS tasks from the perspective of the drifted random walk on the state space, and designs a novel method called Evolutionary Stochastic Policy Distillation (ESPD) to solve them based on the insight of reducing the First Hitting Time of the stochastic process.
Maximum Entropy Gain Exploration for Long Horizon Multi-goal Reinforcement Learning
TLDR
This paper proposes to optimize this objective by having the agent pursue past achieved goals in sparsely explored areas of the goal space, which focuses exploration on the frontier of the achievable goal set.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 14 REFERENCES
Self-improvement Based on Reinforcement Learning, Planning and Teaching
TLDR
Three extensions to the two basic learning algorithms are investigated and it is shown that the extensions can effectively improve the learning rate and in many cases even the asymptotic performance.
Learning and Sequential Decision Making
|In this report we show how the class of adaptive prediction methods that Sutton called \temporal di erence," or TD, methods are related to the theory of squential decision making. TD methods have
Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming
TLDR
This paper extends previous work with Dyna, a class of architectures for intelligent systems based on approximating dynamic programming methods, and presents and shows results for two Dyna architectures, based on Watkins's Q-learning, a new kind of reinforcement learning.
Learning in embedded systems
TLDR
This dissertation addresses the problem of designing algorithms for learning in embedded systems using Sutton's techniques for linear association and reinforcement comparison, while the interval estimation algorithm uses the statistical notion of confidence intervals to guide its generation of actions.
THE ROLE OF EXPLORATION IN LEARNING CONTROL
Whenever an intelligent agent learns to control an unknown environment, two opposing objectives have to be combined. On the one hand, the environment must be su ciently explored in order to identify
Hierarchical learning: Preliminary results
  • Proceedings of the Tenth International Conference on Machine Learning
  • 1993
Learning to predict bythe method of temporal di erences
  • 1992
Real-time learning and control using asynchronous dynamic programming
  • Real-time learning and control using asynchronous dynamic programming
  • 1991
A Role for Anticipation in Reactive Systems that Learn
TLDR
The role of anticipation in reactive learning systems is reviewed, where reactive systems depend on precompiled knowledge about the way to behave in particular situations to obtain their performance.
Learning from Delayed Rewards King's College
  • Learning from Delayed Rewards King's College
  • 1989
...
1
2
...