• Corpus ID: 241032813

Curriculum Offline Imitation Learning

@article{Liu2021CurriculumOI,
  title={Curriculum Offline Imitation Learning},
  author={Minghuan Liu and Hanye Zhao and Zhengyu Yang and Jian Shen and Weinan Zhang and Li Zhao and Tie-Yan Liu},
  journal={ArXiv},
  year={2021},
  volume={abs/2111.02056}
}
Offline reinforcement learning (RL) tasks require the agent to learn from a precollected dataset with no further interactions with the environment. Despite the potential to surpass the behavioral policies, RL-based methods are generally impractical due to the training instability and bootstrapping the extrapolation errors, which always require careful hyperparameter tuning via online evaluation. In contrast, offline imitation learning (IL) has no such issues since it learns the policy directly… 

References

SHOWING 1-10 OF 27 REFERENCES

MOPO: Model-based Offline Policy Optimization

TLDR
A new model-based offline RL algorithm is proposed that applies the variance of a Lipschitz-regularized model as a penalty to the reward function, and it is found that this algorithm outperforms both standard model- based RL methods and existing state-of-the-art model-free offline RL approaches on existing offline RL benchmarks, as well as two challenging continuous control tasks.

On Value Discrepancy of Imitation Learning

TLDR
A framework to analyze the theoretical property of imitation learning approaches based on discrepancy propagation analysis implies that GAIL has less compounding errors than behavioral cloning, which is verified empirically in this paper and indicated that the proposed framework is a general tool to analyze imitationLearning approaches.

NeoRL: A Near Real-World Benchmark for Offline Reinforcement Learning

TLDR
A Near real-world offline RL benchmark is presented, named NeoRL, which contains datasets from various domains with controlled sizes, and extra test datasets for policy validation, and it is argued that the performance of a policy should also be compared with the deterministic version of the behavior policy, instead of the dataset reward.

MOReL : Model-Based Offline Reinforcement Learning

TLDR
Theoretically, it is shown that MOReL is minimax optimal (up to log factors) for offline RL, and through experiments, it matches or exceeds state-of-the-art results in widely studied offline RL benchmarks.

Efficient Reductions for Imitation Learning

TLDR
This work proposes two alternative algorithms for imitation learning where training occurs over several episodes of interaction and shows that this leads to stronger performance guarantees and improved performance on two challenging problems: training a learner to play a 3D racing game and Mario Bros.

BAIL: Best-Action Imitation Learning for Batch Deep Reinforcement Learning

TLDR
This work proposes a new algorithm, Best-Action Imitation Learning (BAIL), which learns a V function, uses the V function to select actions it believes to be high-performing, and then uses those actions to train a policy network using imitation learning.

Conservative Q-Learning for Offline Reinforcement Learning

TLDR
Conservative Q-learning (CQL) is proposed, which aims to address limitations of offline RL methods by learning a conservative Q-function such that the expected value of a policy under this Q- function lower-bounds its true value.

A Divergence Minimization Perspective on Imitation Learning Methods

TLDR
A unified probabilistic perspective on IL algorithms based on divergence minimization is presented, conclusively identifying that IRL's state-marginal matching objective contributes most to its superior performance, and applies the new understanding of IL methods to the problem of state-Marginal matching.

Generative Adversarial Imitation Learning

TLDR
A new general framework for directly extracting a policy from data, as if it were obtained by reinforcement learning following inverse reinforcement learning, is proposed and a certain instantiation of this framework draws an analogy between imitation learning and generative adversarial networks.

Exponentially Weighted Imitation Learning for Batched Historical Data

TLDR
A monotonic advantage reweighted imitation learning strategy that is applicable to problems with complex nonlinear function approximation and works well with hybrid (discrete and continuous) action space and can be used to learn from data generated by an unknown policy.