# Robust Bayesian Inverse Reinforcement Learning with Sparse Behavior Noise

@inproceedings{Zheng2014RobustBI, title={Robust Bayesian Inverse Reinforcement Learning with Sparse Behavior Noise}, author={Jiangchuan Zheng and Siyuan Liu and Lionel M. Ni}, booktitle={AAAI}, year={2014} }

Inverse reinforcement learning (IRL) aims to recover the reward function underlying a Markov Decision Process from behaviors of experts in support of decision-making. Most recent work on IRL assumes the same level of trustworthiness of all expert behaviors, and frames IRL as a process of seeking reward function that makes those behaviors appear (near)-optimal. However, it is common in reality that noisy expert behaviors disobeying the optimal policy exist, which may degrade the IRL…

## 51 Citations

CWAE-IRL: Formulating a supervised approach to Inverse Reinforcement Learning problem

- Computer ScienceArXiv
- 2019

Experimental results on standard benchmarks such as objectworld and pendulum show that the proposed algorithm can effectively learn the latent reward function in complex, high-dimensional environments.

Distributionally Robust Imitation Learning

- Computer ScienceNeurIPS
- 2021

It is shown that DR O IL can be seen as a framework that maximizes a generalized concept of entropy, and a close connection between DR OIL and Maximum Entropy Inverse Reinforcement Learning is established.

Bayesian Robust Optimization for Imitation Learning

- Computer ScienceNeurIPS
- 2020

BROIL leverages Bayesian reward function inference and a user specific risk tolerance to efficiently optimize a robust policy that balances expected return and conditional value at risk and outperforms existing risk-sensitive and risk-neutral inverse reinforcement learning algorithms.

Estimation of Discount Factor in a Model-Based Inverse Reinforcement Learning Framework

- Computer Science
- 2021

This work adapts the model-based maximum entropy IRL framework and opts for a utility-based softmax likelihood function via a feature- based gradient update to jointly learn the discount factor and reward in Inverse Reinforcement Learning.

Efficient Probabilistic Performance Bounds for Inverse Reinforcement Learning

- Computer ScienceAAAI
- 2018

A sampling method based on Bayesian inverse reinforcement learning that uses demonstrations to determine practical high-confidence upper bounds on the alpha-worst-case difference in expected return between any evaluation policy and the optimal policy under the expert's unknown reward function is proposed.

Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations

- Computer ScienceICML
- 2019

A novel reward-learning-from-observation algorithm, Trajectory-ranked Reward EXtrapolation (T-REX), that extrapolates beyond a set of (approximately) ranked demonstrations in order to infer high-quality reward functions from a setof potentially poor demonstrations.

Active Reward Learning from Critiques

- Computer Science2018 IEEE International Conference on Robotics and Automation (ICRA)
- 2018

This work proposes a novel trajectory-based active Bayesian inverse reinforcement learning algorithm that queries the user for critiques of automatically generated trajectories, utilizes trajectory segmentation to expedite the critique / labeling process, and predicts the user's critiques to generate the most highly informative trajectory queries.

Imitating, Fast and Slow: Robust learning from demonstrations via decision-time planning

- Computer ScienceArXiv
- 2022

Imitation with Planning at Test-time (IMPLANT) is proposed, a new meta-algorithm for imitation learning that utilizes decision-time planning to correct for compounding errors of any base imitation policy.

Inverse Reinforcement Learning from Failure

- Computer ScienceAAMAS
- 2016

This paper proposes inverse reinforcement learning from failure (IRLF), a new constrained optimisation formulation that accommodates both types of demonstrations while remaining convex and derives update rules for learning reward functions and policies.

Combination of learning from non-optimal demonstrations and feedbacks using inverse reinforcement learning and Bayesian policy improvement

- Computer ScienceExpert Syst. Appl.
- 2018

## References

SHOWING 1-10 OF 28 REFERENCES

Maximum Entropy Inverse Reinforcement Learning

- Computer ScienceAAAI
- 2008

A probabilistic approach based on the principle of maximum entropy that provides a well-defined, globally normalized distribution over decision sequences, while providing the same performance guarantees as existing methods is developed.

Bayesian Inverse Reinforcement Learning

- Computer ScienceIJCAI
- 2007

This paper shows how to combine prior knowledge and evidence from the expert's actions to derive a probability distribution over the space of reward functions and presents efficient algorithms that find solutions for the reward learning and apprenticeship learning tasks that generalize well over these distributions.

Apprenticeship learning via inverse reinforcement learning

- Computer ScienceICML
- 2004

This work thinks of the expert as trying to maximize a reward function that is expressible as a linear combination of known features, and gives an algorithm for learning the task demonstrated by the expert, based on using "inverse reinforcement learning" to try to recover the unknown reward function.

Active Learning for Reward Estimation in Inverse Reinforcement Learning

- Computer ScienceECML/PKDD
- 2009

An algorithm is proposed that allows the agent to query the demonstrator for samples at specific states, instead of relying only on samples provided at "arbitrary" states, to estimate the reward function with similar accuracy as other methods from the literature while reducing the amount of policy samples required from the expert.

Inverse Reinforcement Learning with PI 2

- Computer Science
- 2010

An algorithm that recovers an unknown cost function from expert-Demonstrated trajectories in continuous space by enforcing the constraint that the expert-demonstrated trajectory does not change under the PI update rule, and hence is locally optimal.

Nonlinear Inverse Reinforcement Learning with Gaussian Processes

- Computer ScienceNIPS
- 2011

A probabilistic algorithm that allows complex behaviors to be captured from suboptimal stochastic demonstrations, while automatically balancing the simplicity of the learned reward structure against its consistency with the observed actions.

Apprenticeship Learning using Inverse Reinforcement Learning and Gradient Methods

- Computer ScienceUAI
- 2007

A novel gradient algorithm to learn a policy from an expert's observed behavior assuming that the expert behaves optimally with respect to some unknown reward function of a Markovian Decision Problem is proposed.

Maximum margin planning

- Computer ScienceICML
- 2006

This work learns mappings from features to cost so an optimal policy in an MDP with these cost mimics the expert's behavior, and demonstrates a simple, provably efficient approach to structured maximum margin learning, based on the subgradient method, that leverages existing fast algorithms for inference.

Supervised Probabilistic Robust Embedding with Sparse Noise

- Computer ScienceAAAI
- 2012

This paper proposes a supervised probabilistic robust embedding (SPRE) model in which data are corrupted either by sparse noise or by a combination of Gaussian and sparse noises and devise a two-fold variational EM learning algorithm in which the update of model parameters has analytical solution.

Inverse reinforcement learning with evaluation

- Computer ScienceProceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006.
- 2006

A variant of IRL, which is called IRL with evaluation (IRLE) where instead of observing the desired agent behaviour, the relative evaluation between different behaviours is known by the access to an evaluator.