Corpus ID: 13973870

# Robust Bayesian Inverse Reinforcement Learning with Sparse Behavior Noise

@inproceedings{Zheng2014RobustBI,
title={Robust Bayesian Inverse Reinforcement Learning with Sparse Behavior Noise},
author={Jiangchuan Zheng and Siyuan Liu and Lionel M. Ni},
booktitle={AAAI},
year={2014}
}
• Published in AAAI 2014
• Computer Science
Inverse reinforcement learning (IRL) aims to recover the reward function underlying a Markov Decision Process from behaviors of experts in support of decision-making. Most recent work on IRL assumes the same level of trustworthiness of all expert behaviors, and frames IRL as a process of seeking reward function that makes those behaviors appear (near)- optimal. However, it is common in reality that noisy expert behaviors disobeying the optimal policy exist, which may degrade the IRL performance… Expand
43 Citations
CWAE-IRL: Formulating a supervised approach to Inverse Reinforcement Learning problem
Experimental results on standard benchmarks such as objectworld and pendulum show that the proposed algorithm can effectively learn the latent reward function in complex, high-dimensional environments. Expand
Marginal MAP Estimation for Inverse RL under Occlusion with Observer Noise
• Computer Science
• ArXiv
• 2021
It is shown that the marginal MAP (MMAP) approach significantly improves on the previous IRL technique under occlusion in both formative evaluations on a toy problem and in a summative evaluation on an onion sorting line task by a robot. Expand
Efficient Probabilistic Performance Bounds for Inverse Reinforcement Learning
• Computer Science, Mathematics
• AAAI
• 2018
A sampling method based on Bayesian inverse reinforcement learning that uses demonstrations to determine practical high-confidence upper bounds on the $\alpha$-worst-case difference in expected return between any evaluation policy and the optimal policy under the expert's unknown reward function is proposed. Expand
Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations
• Computer Science, Mathematics
• ICML
• 2019
A novel reward-learning-from-observation algorithm, Trajectory-ranked Reward EXtrapolation (T-REX), that extrapolates beyond a set of (approximately) ranked demonstrations in order to infer high-quality reward functions from a setof potentially poor demonstrations. Expand
Active Learning from Critiques via Bayesian Inverse Reinforcement Learning
• Computer Science
• 2017
A novel trajectory-based active Bayesian inverse reinforcement learning algorithm that queries the user for critiques of automatically generated trajectories, 2) utilizes trajectory segmentation to expedite the critique / labeling process, and 3) predicts the user’s critiques to generate the most highly informative trajectory queries. Expand
ROBUST IMITATION VIA DECISION-TIME PLANNING
• 2020
The goal of imitation learning is to mimic expert behavior from demonstrations, without access to an explicit reward signal. A popular class of approach infers the (unknown) reward function viaExpand
Active Reward Learning from Critiques
• Computer Science
• 2018 IEEE International Conference on Robotics and Automation (ICRA)
• 2018
This work proposes a novel trajectory-based active Bayesian inverse reinforcement learning algorithm that queries the user for critiques of automatically generated trajectories, utilizes trajectory segmentation to expedite the critique / labeling process, and predicts the user's critiques to generate the most highly informative trajectory queries. Expand
On the Feasibility of Learning, Rather than Assuming, Human Biases for Reward Inference
• Computer Science, Mathematics
• ICML
• 2019
Mixed findings suggest that at least for the foreseeable future, agents need a middle ground between the flexibility of data-driven methods and the useful bias of known human biases. Expand
Inverse Reinforcement Learning From Like-Minded Teachers
• Computer Science
• AAAI
• 2021
It is demonstrated that inverse reinforcement learning algorithms that satisfy a certain property — that of matching feature expectations — yield policies that are approximately optimal with respect to the underlying reward function, and that no algorithm can do better in the worst case. Expand
Inverse Reinforcement Learning from Failure
• Computer Science
• AAMAS
• 2016
This paper proposes inverse reinforcement learning from failure (IRLF), a new constrained optimisation formulation that accommodates both types of demonstrations while remaining convex and derives update rules for learning reward functions and policies. Expand

#### References

SHOWING 1-10 OF 28 REFERENCES
Maximum Entropy Inverse Reinforcement Learning
• Computer Science
• AAAI
• 2008
A probabilistic approach based on the principle of maximum entropy that provides a well-defined, globally normalized distribution over decision sequences, while providing the same performance guarantees as existing methods is developed. Expand
Bayesian Inverse Reinforcement Learning
• Computer Science
• IJCAI
• 2007
This paper shows how to combine prior knowledge and evidence from the expert's actions to derive a probability distribution over the space of reward functions and presents efficient algorithms that find solutions for the reward learning and apprenticeship learning tasks that generalize well over these distributions. Expand
Apprenticeship learning via inverse reinforcement learning
• Computer Science
• ICML
• 2004
This work thinks of the expert as trying to maximize a reward function that is expressible as a linear combination of known features, and gives an algorithm for learning the task demonstrated by the expert, based on using "inverse reinforcement learning" to try to recover the unknown reward function. Expand
Active Learning for Reward Estimation in Inverse Reinforcement Learning
• Computer Science
• ECML/PKDD
• 2009
An algorithm is proposed that allows the agent to query the demonstrator for samples at specific states, instead of relying only on samples provided at "arbitrary" states, to estimate the reward function with similar accuracy as other methods from the literature while reducing the amount of policy samples required from the expert. Expand
Inverse Reinforcement Learning with PI 2
We present an algorithm that recovers an unknown cost function from expert-demonstrated trajectories in continuous space. We assume that the cost function is a weighted linear combination ofExpand
Nonlinear Inverse Reinforcement Learning with Gaussian Processes
• Mathematics, Computer Science
• NIPS
• 2011
A probabilistic algorithm that allows complex behaviors to be captured from suboptimal stochastic demonstrations, while automatically balancing the simplicity of the learned reward structure against its consistency with the observed actions. Expand
Apprenticeship Learning using Inverse Reinforcement Learning and Gradient Methods
• Computer Science, Mathematics
• UAI
• 2007
A novel gradient algorithm to learn a policy from an expert's observed behavior assuming that the expert behaves optimally with respect to some unknown reward function of a Markovian Decision Problem is proposed. Expand
• Computer Science, Mathematics
• EWRL
• 2011
The main contribution is to formalise the problem of inverse reinforcement learning as statistical preference elicitation, via a number of structured priors, whose form captures the authors' biases about the relatedness of different tasks or expert policies. Expand
Maximum margin planning
• Computer Science
• ICML
• 2006
This work learns mappings from features to cost so an optimal policy in an MDP with these cost mimics the expert's behavior, and demonstrates a simple, provably efficient approach to structured maximum margin learning, based on the subgradient method, that leverages existing fast algorithms for inference. Expand
Supervised Probabilistic Robust Embedding with Sparse Noise
• Computer Science
• AAAI
• 2012
This paper proposes a supervised probabilistic robust embedding (SPRE) model in which data are corrupted either by sparse noise or by a combination of Gaussian and sparse noises and devise a twofold variational EM learning algorithm in which the update of model parameters has analytical solution. Expand