Corpus ID: 209202457

Imitation Learning via Off-Policy Distribution Matching

@article{Kostrikov2020ImitationLV,
  title={Imitation Learning via Off-Policy Distribution Matching},
  author={Ilya Kostrikov and Ofir Nachum and J. Tompson},
  journal={ArXiv},
  year={2020},
  volume={abs/1912.05032}
}
  • Ilya Kostrikov, Ofir Nachum, J. Tompson
  • Published 2020
  • Mathematics, Computer Science
  • ArXiv
  • When performing imitation learning from expert demonstrations, distribution matching is a popular approach, in which one alternates between estimating distribution ratios and then using these ratios as rewards in a standard reinforcement learning (RL) algorithm. Traditionally, estimation of the distribution ratio requires on-policy data, which has caused previous work to either be exorbitantly data- inefficient or alter the original objective in a manner that can drastically change its optimum… CONTINUE READING

    Figures and Topics from this paper.

    AlgaeDICE: Policy Gradient from Arbitrary Experience
    13
    Strictly Batch Imitation Learning by Energy-based Distribution Matching
    Efficient Imitation Learning with Local Trajectory Optimization
    Adversarial Soft Advantage Fitting: Imitation Learning without Policy Optimization
    Learning Sparse Rewarded Tasks from Sub-Optimal Demonstrations
    1
    Stable Policy Optimization via Off-Policy Divergence Regularization
    1
    Concurrent Training Improves the Performance of Behavioral Cloning from Observation
    Imitation Learning with Sinkhorn Distances
    Reinforcement Learning via Fenchel-Rockafellar Duality
    7

    References

    Publications referenced by this paper.
    SHOWING 1-10 OF 26 REFERENCES
    DualDICE: Efficient Estimation of Off-Policy Stationary Distribution Corrections
    3
    A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning
    1188
    Apprenticeship learning via inverse reinforcement learning
    1911
    Generative Adversarial Imitation Learning
    804
    Discriminator-Actor-Critic: Addressing Sample Inefficiency and Reward Bias in Adversarial Imitation Learning
    43
    Learning Robust Rewards with Adversarial Inverse Reinforcement Learning
    179
    Soft Actor-Critic Algorithms and Applications
    213
    Relative Entropy Policy Search
    421
    InfoGAIL: Interpretable Imitation Learning from Visual Demonstrations
    151