• Corpus ID: 235187174

Hyperparameter Selection for Imitation Learning

  title={Hyperparameter Selection for Imitation Learning},
  author={L'eonard Hussenot and Marcin Andrychowicz and Damien Vincent and Robert Dadashi and Anton Raichuk and Lukasz Stafiniak and Sertan Girgin and Rapha{\"e}l Marinier and Nikola Momchev and Sabela Ramos and Manu Orsini and Olivier Bachem and Matthieu Geist and Olivier Pietquin},
We address the issue of tuning hyperparameters (HPs) for imitation learning algorithms in the context of continuous-control, when the underlying reward function of the demonstrating expert cannot be observed at any time. The vast literature in imitation learning mostly considers this reward function to be available for HP selection, but this is not a realistic setting. Indeed, would this reward function be available, it could then directly be used for policy training and imitation would not be… 
Showing Your Offline Reinforcement Learning Work: Online Evaluation Budget Matters
This work argues for the importance of an online evaluation budget for a reliable comparison of deep offline RL algorithms and demonstrates that the preference between algorithms is budget-dependent across a diverse range of decision-making domains such as Robotics, Finance, and Energy Management.
A Pragmatic Look at Deep Imitation Learning
A pragmatic look at GAIL and related imitation learning algorithms is taken, implementing and automatically tune a range of algorithms in a unified experimental setup, presenting a fair evaluation between the competing methods.
What Matters in Learning from Offline Human Demonstrations for Robot Manipulation
An extensive study of six offline learning algorithms for robot manipulation on simulated and three real-world multi-stage manipulation tasks of varying complexity, and with datasets of varying quality highlights opportunities for learning from human datasets.
Rethinking ValueDice: Does It Really Improve Performance?
It is shown that ValueDice could reduce to BC under the offline setting, and it is verified that overfitting exists and regularization matters in the low-data regime.
Continuous Control with Action Quantization from Demonstrations
Experiments show that the proposed approach outperforms state-of-the-art methods such as SAC in the RL setup, and GAIL in the Imitation Learning setup.


Random Expert Distillation: Imitation Learning via Expert Policy Support Estimation
This work proposes a new framework for imitation learning by estimating the support of the expert policy to compute a fixed reward function, which allows to re-frame imitation learning within the standard reinforcement learning setting.
Learning from Demonstrations: Is It Worth Estimating a Reward Function?
This paper provides a comparative study between Inverse Reinforcement Learning (IRL) and Apprenticeship Learning (AL), which are two frameworks used for the imitation learning problem where an agent tries to learn from demonstrations of an expert.
Hyperparameter Selection for Offline Reinforcement Learning
This work focuses on offline hyperparameter selection, i.e. methods for choosing the best policy from a set of many policies trained using different hyperparameters, given only logged data, and shows that offline RL algorithms are not robust tohyperparameter choices.
Primal Wasserstein Imitation Learning
PWIL is proposed, which ties to the primal form of the Wasserstein distance between the expert and the agent state-action distributions, and presents a reward function which is derived offline, as opposed to recent adversarial IL algorithms that learn a rewarded function through interactions with the environment, and which requires little fine-tuning.
A Divergence Minimization Perspective on Imitation Learning Methods
A unified probabilistic perspective on IL algorithms based on divergence minimization is presented, conclusively identifying that IRL's state-marginal matching objective contributes most to its superior performance, and applies the new understanding of IL methods to the problem of state-Marginal matching.
Generative Adversarial Imitation Learning
A new general framework for directly extracting a policy from data, as if it were obtained by reinforcement learning following inverse reinforcement learning, is proposed and a certain instantiation of this framework draws an analogy between imitation learning and generative adversarial networks.
Boosted Bellman Residual Minimization Handling Expert Demonstrations
This paper addresses the problem of batch Reinforcement Learning with Expert Demonstrations (RLED) by proposing algorithms that leverage expert data to find an optimal policy of a Markov Decision Process (MDP), using a data set of fixed sampled transitions of the MDP as well as a dataSet of fixed expert demonstrations.
Exploration by Random Network Distillation
An exploration bonus for deep reinforcement learning methods that is easy to implement and adds minimal overhead to the computation performed and a method to flexibly combine intrinsic and extrinsic rewards that enables significant progress on several hard exploration Atari games is introduced.
Maximum Entropy Inverse Reinforcement Learning
A probabilistic approach based on the principle of maximum entropy that provides a well-defined, globally normalized distribution over decision sequences, while providing the same performance guarantees as existing methods is developed.
Imitation Learning as f-Divergence Minimization
This work proposes a general imitation learning framework for estimating and minimizing any f-Divergence, and shows that the approximate I-projection technique is able to imitate multi-modal behaviors more reliably than GAIL and behavior cloning.