Corpus ID: 44117621

Maximum Causal Tsallis Entropy Imitation Learning

@article{Lee2018MaximumCT,
  title={Maximum Causal Tsallis Entropy Imitation Learning},
  author={Kyungjae Lee and Sungjoon Choi and Songhwai Oh},
  journal={ArXiv},
  year={2018},
  volume={abs/1805.08336}
}
In this paper, we propose a novel maximum causal Tsallis entropy (MCTE) framework for imitation learning which can efficiently learn a sparse multi-modal policy distribution from demonstrations. We provide the full mathematical analysis of the proposed framework. First, the optimal solution of an MCTE problem is shown to be a sparsemax distribution, whose supporting set can be adjusted. The proposed method has advantages over a softmax distribution in that it can exclude unnecessary actions by… Expand
Entropic Regularization of Markov Decision Processes
TLDR
A broader family of f-divergences is considered, and more concretely α-diversgences are considered, which inherit the beneficial property of providing the policy improvement step in closed form at the same time yielding a corresponding dual objective for policy evaluation. Expand
Imitation Learning as f-Divergence Minimization
TLDR
This work proposes a general imitation learning framework for estimating and minimizing any f-Divergence, and shows that the approximate I-projection technique is able to imitate multi-modal behaviors more reliably than GAIL and behavior cloning. Expand
Correlated Adversarial Imitation Learning
TLDR
A novel imitation learning algorithm is introduced by applying a game-theoretic notion of correlated equilibrium to the generative adversarial imitation learning, equipped with queues of discriminators and agents, in contrast with the classical approach. Expand
Semi-Supervised Imitation Learning with Mixed Qualities of Demonstrations for Autonomous Driving
  • Gunmin Lee, Wooseok Oh, +5 authors Songhwai Oh
  • Computer Science
  • ArXiv
  • 2021
TLDR
The experimental results demonstrate the validity of the proposed algorithm using unlabeled trajectories with mixed qualities and the hardware experiments are conducted to show that the proposed method can be applied to real-world applications. Expand
Divergence-Augmented Policy Optimization
TLDR
Empirical experiments show that in the data-scarce scenario where the reuse of off-policy data becomes necessary, the method can achieve better performance than other state-of-the-art deep reinforcement learning algorithms. Expand
MixGAIL: Autonomous Driving Using Demonstrations with Mixed Qualities
TLDR
A novel method is proposed, called mixed generative adversarial imitation learning (MixGAIL), which incorporates both of expert demonstrations and negative demonstrations, such as vehicle collisions, which converges faster than the other baseline methods. Expand
A Survey of Inverse Reinforcement Learning: Challenges, Methods and Progress
TLDR
The survey formally introduces the IRL problem along with its central challenges which include accurate inference, generalizability, correctness of prior knowledge, and growth in solution complexity with problem size and elaborates how the current methods mitigate these challenges. Expand
Generative Adversarial Imitation Learning with Deep P-Network for Robotic Cloth Manipulation
TLDR
Experimental results suggest both fast and stable imitation learning ability and sample efficiency of P-GAIL in robotic cloth manipulation. Expand
Sparse Randomized Shortest Paths Routing with Tsallis Divergence Regularization
TLDR
The sparse RSP is a promising model of movements on a graph, balancing sparse exploitation and exploration in an optimal way, and the derived dissimilarity measures based on expected routing costs provide state-of-the-art results. Expand
Inverse Decision Modeling: Learning Interpretable Representations of Behavior
TLDR
This paper develops an expressive, unifying perspective on inverse decision modeling: a framework for learning parameterized representations of sequential decision behavior, which formalizes the forward problem (as a normative standard), subsuming common classes of control behavior. Expand
...
1
2
...

References

SHOWING 1-10 OF 31 REFERENCES
Path Consistency Learning in Tsallis Entropy Regularized MDPs
TLDR
A class of novel path consistency learning (PCL) algorithms, called {\em sparse PCL}, for the sparse ERL problem that can work with both on-policy and off-policy data, and is empirically compared with its soft counterpart, and shows its advantage, especially in problems with a large number of actions. Expand
Infinite Time Horizon Maximum Causal Entropy Inverse Reinforcement Learning
TLDR
The maximum causal entropy framework is extended to the infinite time horizon setting and a gradient-based algorithm for the maximum discounted causal entropy formulation is developed that enjoys the desired feature of being model agnostic, a property that is absent in many previous IRL algorithms. Expand
Sparse Markov Decision Processes With Causal Sparse Tsallis Entropy Regularization for Reinforcement Learning
TLDR
A sparse Markov decision process (MDP) with novel causal sparse Tsallis entropy regularization with outperforms existing methods in terms of the convergence speed and performance and a sparse value iteration method that solves a sparse MDP and proves the convergence and optimality of sparse value iterations using the Banach fixed-point theorem is proposed. Expand
Maximum Entropy Inverse Reinforcement Learning
TLDR
A probabilistic approach based on the principle of maximum entropy that provides a well-defined, globally normalized distribution over decision sequences, while providing the same performance guarantees as existing methods is developed. Expand
Modeling purposeful adaptive behavior with the principle of maximum causal entropy
TLDR
The principle of maximum causal entropy is introduced, a general technique for applying information theory to decision-theoretic, game-the theoretical, and control settings where relevant information is sequentially revealed over time. Expand
Robust Bayesian Inverse Reinforcement Learning with Sparse Behavior Noise
TLDR
This paper develops a robust IRL framework that can accurately estimate the reward function in the presence of behavior noise, and introduces a novel latent variable characterizing the reliability of each expert action and uses Laplace distribution as its prior. Expand
Reinforcement Learning with Deep Energy-Based Policies
TLDR
A method for learning expressive energy-based policies for continuous states and actions, which has been feasible only in tabular domains before, is proposed and a new algorithm, called soft Q-learning, that expresses the optimal policy via a Boltzmann distribution is applied. Expand
Robust Imitation of Diverse Behaviors
TLDR
A new version of GAIL is developed that is much more robust than the purely-supervised controller, especially with few demonstrations, and avoids mode collapse, capturing many diverse behaviors when GAIL on its own does not. Expand
Generative Adversarial Imitation Learning
TLDR
A new general framework for directly extracting a policy from data, as if it were obtained by reinforcement learning following inverse reinforcement learning, is proposed and a certain instantiation of this framework draws an analogy between imitation learning and generative adversarial networks. Expand
Maximum Entropy Deep Inverse Reinforcement Learning
TLDR
It is shown that the Maximum Entropy paradigm for IRL lends itself naturally to the efficient training of deep architectures, and the approach achieves performance commensurate to the state-of-the-art on existing benchmarks while exceeding on an alternative benchmark based on highly varying reward structures. Expand
...
1
2
3
4
...