Corpus ID: 44117621

# Maximum Causal Tsallis Entropy Imitation Learning

@article{Lee2018MaximumCT,
title={Maximum Causal Tsallis Entropy Imitation Learning},
author={Kyungjae Lee and Sungjoon Choi and Songhwai Oh},
journal={ArXiv},
year={2018},
volume={abs/1805.08336}
}
• Published 2018
• Computer Science, Mathematics
• ArXiv
In this paper, we propose a novel maximum causal Tsallis entropy (MCTE) framework for imitation learning which can efficiently learn a sparse multi-modal policy distribution from demonstrations. We provide the full mathematical analysis of the proposed framework. First, the optimal solution of an MCTE problem is shown to be a sparsemax distribution, whose supporting set can be adjusted. The proposed method has advantages over a softmax distribution in that it can exclude unnecessary actions by… Expand
Entropic Regularization of Markov Decision Processes
• Computer Science, Mathematics
• Entropy
• 2019
A broader family of f-divergences is considered, and more concretely α-diversgences are considered, which inherit the beneficial property of providing the policy improvement step in closed form at the same time yielding a corresponding dual objective for policy evaluation. Expand
Imitation Learning as f-Divergence Minimization
• Computer Science, Mathematics
• WAFR
• 2021
This work proposes a general imitation learning framework for estimating and minimizing any f-Divergence, and shows that the approximate I-projection technique is able to imitate multi-modal behaviors more reliably than GAIL and behavior cloning. Expand
A novel imitation learning algorithm is introduced by applying a game-theoretic notion of correlated equilibrium to the generative adversarial imitation learning, equipped with queues of discriminators and agents, in contrast with the classical approach. Expand
Semi-Supervised Imitation Learning with Mixed Qualities of Demonstrations for Autonomous Driving
• Gunmin Lee, +5 authors Songhwai Oh
• Computer Science
• ArXiv
• 2021
The experimental results demonstrate the validity of the proposed algorithm using unlabeled trajectories with mixed qualities and the hardware experiments are conducted to show that the proposed method can be applied to real-world applications. Expand
Divergence-Augmented Policy Optimization
• Computer Science
• NeurIPS
• 2019
Empirical experiments show that in the data-scarce scenario where the reuse of off-policy data becomes necessary, the method can achieve better performance than other state-of-the-art deep reinforcement learning algorithms. Expand
MixGAIL: Autonomous Driving Using Demonstrations with Mixed Qualities
• Computer Science
• 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
• 2020
A novel method is proposed, called mixed generative adversarial imitation learning (MixGAIL), which incorporates both of expert demonstrations and negative demonstrations, such as vehicle collisions, which converges faster than the other baseline methods. Expand
A Survey of Inverse Reinforcement Learning: Challenges, Methods and Progress
• Computer Science, Mathematics
• Artif. Intell.
• 2021
The survey formally introduces the IRL problem along with its central challenges which include accurate inference, generalizability, correctness of prior knowledge, and growth in solution complexity with problem size and elaborates how the current methods mitigate these challenges. Expand
Generative Adversarial Imitation Learning with Deep P-Network for Robotic Cloth Manipulation
• Computer Science
• 2019 IEEE-RAS 19th International Conference on Humanoid Robots (Humanoids)
• 2019
Experimental results suggest both fast and stable imitation learning ability and sample efficiency of P-GAIL in robotic cloth manipulation. Expand
Sparse Randomized Shortest Paths Routing with Tsallis Divergence Regularization
• Computer Science, Mathematics
• Data Min. Knowl. Discov.
• 2021
The sparse RSP is a promising model of movements on a graph, balancing sparse exploitation and exploration in an optimal way, and the derived dissimilarity measures based on expected routing costs provide state-of-the-art results. Expand
Inverse Decision Modeling: Learning Interpretable Representations of Behavior
• Computer Science
• ICML
• 2021
This paper develops an expressive, unifying perspective on inverse decision modeling: a framework for learning parameterized representations of sequential decision behavior, which formalizes the forward problem (as a normative standard), subsuming common classes of control behavior. Expand

#### References

SHOWING 1-10 OF 31 REFERENCES
Path Consistency Learning in Tsallis Entropy Regularized MDPs
• Computer Science, Mathematics
• ICML
• 2018
A class of novel path consistency learning (PCL) algorithms, called {\em sparse PCL}, for the sparse ERL problem that can work with both on-policy and off-policy data, and is empirically compared with its soft counterpart, and shows its advantage, especially in problems with a large number of actions. Expand
Infinite Time Horizon Maximum Causal Entropy Inverse Reinforcement Learning
• Mathematics, Computer Science
• IEEE Transactions on Automatic Control
• 2018
The maximum causal entropy framework is extended to the infinite time horizon setting and a gradient-based algorithm for the maximum discounted causal entropy formulation is developed that enjoys the desired feature of being model agnostic, a property that is absent in many previous IRL algorithms. Expand
Sparse Markov Decision Processes With Causal Sparse Tsallis Entropy Regularization for Reinforcement Learning
• Computer Science, Mathematics
• IEEE Robotics and Automation Letters
• 2018
A sparse Markov decision process (MDP) with novel causal sparse Tsallis entropy regularization with outperforms existing methods in terms of the convergence speed and performance and a sparse value iteration method that solves a sparse MDP and proves the convergence and optimality of sparse value iterations using the Banach fixed-point theorem is proposed. Expand
Maximum Entropy Inverse Reinforcement Learning
• Computer Science
• AAAI
• 2008
A probabilistic approach based on the principle of maximum entropy that provides a well-defined, globally normalized distribution over decision sequences, while providing the same performance guarantees as existing methods is developed. Expand
Modeling purposeful adaptive behavior with the principle of maximum causal entropy
• Computer Science
• 2010
The principle of maximum causal entropy is introduced, a general technique for applying information theory to decision-theoretic, game-the theoretical, and control settings where relevant information is sequentially revealed over time. Expand
Robust Bayesian Inverse Reinforcement Learning with Sparse Behavior Noise
• Computer Science
• AAAI
• 2014
This paper develops a robust IRL framework that can accurately estimate the reward function in the presence of behavior noise, and introduces a novel latent variable characterizing the reliability of each expert action and uses Laplace distribution as its prior. Expand
Reinforcement Learning with Deep Energy-Based Policies
• Computer Science
• ICML
• 2017
A method for learning expressive energy-based policies for continuous states and actions, which has been feasible only in tabular domains before, is proposed and a new algorithm, called soft Q-learning, that expresses the optimal policy via a Boltzmann distribution is applied. Expand
Robust Imitation of Diverse Behaviors
• Computer Science, Mathematics
• NIPS
• 2017
A new version of GAIL is developed that is much more robust than the purely-supervised controller, especially with few demonstrations, and avoids mode collapse, capturing many diverse behaviors when GAIL on its own does not. Expand