• Corpus ID: 103456

# A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning

@inproceedings{Ross2011ARO,
title={A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning},
author={St{\'e}phane Ross and Geoffrey J. Gordon and J. Andrew Bagnell},
booktitle={AISTATS},
year={2011}
}
• Published in AISTATS 2 November 2010
• Computer Science, Mathematics
Sequential prediction problems such as imitation learning, where future observations depend on previous predictions (actions), violate the common i.i.d. [...] Key Method We show that any such no regret algorithm, combined with additional reduction assumptions, must find a policy with good performance under the distribution of observations it induces in such sequential settings. We demonstrate that this new approach outperforms previous approaches on two challenging imitation learning problems and a benchmark…Expand
1,710 Citations
Imitation Learning
• Cmpt
• Computer Vision
• 2021
Sequential prediction problems such as imitation learning, where future observations depend on previous predictions (actions), violate the common i.i.d. assumptions made in statistical learning. This
Reinforcement and Imitation Learning via Interactive No-Regret Learning
• Computer Science, Mathematics
ArXiv
• 2014
This work develops an interactive imitation learning approach that leverages cost information and extends the technique to address reinforcement learning, suggesting a broad new family of algorithms and providing a unifying view of existing techniques for imitation and reinforcement learning.
Explaining fast improvement in online imitation learning
• Xinyan Yan, Ching-An Cheng
• Computer Science
UAI
• 2021
It is proved that, after N rounds of online IL with stochastic feedback, the policy improves in $\tilde{O}(1/N + \sqrt{\xi/N})$ in both expectation and high probability.
Exponentially Weighted Imitation Learning for Batched Historical Data
• Computer Science
NeurIPS
• 2018
A monotonic advantage reweighted imitation learning strategy that is applicable to problems with complex nonlinear function approximation and works well with hybrid (discrete and continuous) action space and can be used to learn from data generated by an unknown policy.
Active lmitation learning: formal and practical reductions to I.I.D. learning
• Computer Science
J. Mach. Learn. Res.
• 2014
This paper considers active imitation learning with the goal of reducing this effort by querying the expert about the desired action at individual states, which are selected based on answers to past queries and the learner's interactions with an environment simulator.
Imitation Learning by Coaching
• Computer Science
NIPS
• 2012
By a reduction of learning by demonstration to online learning, it is proved that coaching can yield a lower regret bound than using the oracle and this method outperforms state-of-the-art imitation learning methods in dynamic feature selection and two static feature selection methods.
Fast Policy Learning through Imitation and Reinforcement
• Computer Science, Mathematics
UAI
• 2018
LOKI, a strategy for policy learning that first performs a small but random number of IL iterations before switching to a policy gradient RL method, is proposed and it is shown that if the switching time is properly randomized, LOKI can learn to outperform a suboptimal expert and converge faster than running policy gradient from scratch.
Model-Based Imitation Learning with Accelerated Convergence
• Computer Science
ArXiv
• 2018
Two model-based algorithms inspired by Follow-the-Leader with prediction are proposed, based on solving variational inequalities and stochastic first-order updates, that can provably accelerate the convergence rate of online imitation learning, making it more sample efficient.
Accelerating Imitation Learning with Predictive Models
• Computer Science, Mathematics
AISTATS
• 2019
Two model-based algorithms inspired by Follow-the-Leader with prediction are proposed based on solving variational inequalities and MoBIL-Prox based on stochastic first-order updates, which can provably accelerate the best known convergence rate up to an order.
Iterative Noise Injection for Scalable Imitation Learning
An improved bound on the loss due to the covariate shift is proved, and an algorithm that leverages the analysis to estimate the level of -greedy noise to inject is introduced that achieves a better performance than DAgger with 75% fewer demonstrations.

## References

SHOWING 1-10 OF 25 REFERENCES
Efficient Reductions for Imitation Learning
• Computer Science
AISTATS
• 2010
This work proposes two alternative algorithms for imitation learning where training occurs over several episodes of interaction and shows that this leads to stronger performance guarantees and improved performance on two challenging problems: training a learner to play a 3D racing game and Mario Bros.
Apprenticeship learning via inverse reinforcement learning
• Computer Science
ICML
• 2004
This work thinks of the expert as trying to maximize a reward function that is expressible as a linear combination of known features, and gives an algorithm for learning the task demonstrated by the expert, based on using "inverse reinforcement learning" to try to recover the unknown reward function.
Boosting Structured Prediction for Imitation Learning
• Computer Science
NIPS
• 2006
A novel approach, MMPBOOST, is provided, based on the functional gradient descent view of boosting, that extends MMP by "boosting" in new features by using simple binary classification or regression to improve performance of MMP imitation learning.
Online) Subgradient Methods for Structured Prediction
• Computer Science
• 2007
This work proposes using simple subgradient-based techniques for optimizing a regularized risk formulation of structured learning problems in both online and batch settings, and analyzes the theoretical convergence, generalization, and robustness properties of the resulting techniques.
Logarithmic regret algorithms for online convex optimization
• Mathematics, Computer Science
Machine Learning
• 2007
Several algorithms achieving logarithmic regret are proposed, which besides being more general are also much more efficient to implement, and give rise to an efficient algorithm based on the Newton method for optimization, a new tool in the field.
On the generalization ability of on-line learning algorithms
• Computer Science, Mathematics
IEEE Transactions on Information Theory
• 2004
This paper proves tight data-dependent bounds for the risk of this hypothesis in terms of an easily computable statistic M/sub n/ associated with the on-line performance of the ensemble, and obtains risk tail bounds for kernel perceptron algorithms interms of the spectrum of the empirical kernel matrix.
On the Generalization Ability of Online Strongly Convex Programming Algorithms
• Computer Science
NIPS
• 2008
A sharp bound is held on the excess risk of the output of an online algorithm in terms of the average regret, that allows one to use recent algorithms with logarithmic cumulative regret guarantees to achieve fast convergence rates for the excessrisk with high probability.
Interactive Policy Learning through Confidence-Based Autonomy
• Computer Science
J. Artif. Intell. Res.
• 2009
The algorithm selects demonstrations based on a measure of action selection confidence, and results show that using Confident Execution the agent requires fewer demonstrations to learn the policy than when demonstrations are selected by a human teacher.
A survey of robot learning from demonstration
• Computer Science
Robotics Auton. Syst.
• 2009
A comprehensive survey of robot Learning from Demonstration (LfD), a technique that develops policies from example state to action mappings, which analyzes and categorizes the multiple ways in which examples are gathered, as well as the various techniques for policy derivation.
Fast Rates for Regularized Objectives
• Computer Science, Mathematics
NIPS
• 2008
It is shown that the value attained by the empirical minimizer converges to the optimal value with rate 1/n, which is essential for obtaining certain type of oracle inequalities for SVMs.