Robust Bayesian Inverse Reinforcement Learning with Sparse Behavior Noise

@inproceedings{Zheng2014RobustBI,
  title={Robust Bayesian Inverse Reinforcement Learning with Sparse Behavior Noise},
  author={Jiangchuan Zheng and Siyuan Liu and Lionel M. Ni},
  booktitle={AAAI},
  year={2014}
}
Inverse reinforcement learning (IRL) aims to recover the reward function underlying a Markov Decision Process from behaviors of experts in support of decision-making. Most recent work on IRL assumes the same level of trustworthiness of all expert behaviors, and frames IRL as a process of seeking reward function that makes those behaviors appear (near)-optimal. However, it is common in reality that noisy expert behaviors disobeying the optimal policy exist, which may degrade the IRL… 

Figures from this paper

CWAE-IRL: Formulating a supervised approach to Inverse Reinforcement Learning problem
TLDR
Experimental results on standard benchmarks such as objectworld and pendulum show that the proposed algorithm can effectively learn the latent reward function in complex, high-dimensional environments.
Distributionally Robust Imitation Learning
TLDR
It is shown that DR O IL can be seen as a framework that maximizes a generalized concept of entropy, and a close connection between DR OIL and Maximum Entropy Inverse Reinforcement Learning is established.
Bayesian Robust Optimization for Imitation Learning
TLDR
BROIL leverages Bayesian reward function inference and a user specific risk tolerance to efficiently optimize a robust policy that balances expected return and conditional value at risk and outperforms existing risk-sensitive and risk-neutral inverse reinforcement learning algorithms.
Estimation of Discount Factor in a Model-Based Inverse Reinforcement Learning Framework
TLDR
This work adapts the model-based maximum entropy IRL framework and opts for a utility-based softmax likelihood function via a feature- based gradient update to jointly learn the discount factor and reward in Inverse Reinforcement Learning.
Efficient Probabilistic Performance Bounds for Inverse Reinforcement Learning
TLDR
A sampling method based on Bayesian inverse reinforcement learning that uses demonstrations to determine practical high-confidence upper bounds on the alpha-worst-case difference in expected return between any evaluation policy and the optimal policy under the expert's unknown reward function is proposed.
Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations
TLDR
A novel reward-learning-from-observation algorithm, Trajectory-ranked Reward EXtrapolation (T-REX), that extrapolates beyond a set of (approximately) ranked demonstrations in order to infer high-quality reward functions from a setof potentially poor demonstrations.
Active Reward Learning from Critiques
  • Yuchen Cui, S. Niekum
  • Computer Science
    2018 IEEE International Conference on Robotics and Automation (ICRA)
  • 2018
TLDR
This work proposes a novel trajectory-based active Bayesian inverse reinforcement learning algorithm that queries the user for critiques of automatically generated trajectories, utilizes trajectory segmentation to expedite the critique / labeling process, and predicts the user's critiques to generate the most highly informative trajectory queries.
Imitating, Fast and Slow: Robust learning from demonstrations via decision-time planning
TLDR
Imitation with Planning at Test-time (IMPLANT) is proposed, a new meta-algorithm for imitation learning that utilizes decision-time planning to correct for compounding errors of any base imitation policy.
Inverse Reinforcement Learning from Failure
TLDR
This paper proposes inverse reinforcement learning from failure (IRLF), a new constrained optimisation formulation that accommodates both types of demonstrations while remaining convex and derives update rules for learning reward functions and policies.
...
...

References

SHOWING 1-10 OF 28 REFERENCES
Maximum Entropy Inverse Reinforcement Learning
TLDR
A probabilistic approach based on the principle of maximum entropy that provides a well-defined, globally normalized distribution over decision sequences, while providing the same performance guarantees as existing methods is developed.
Bayesian Inverse Reinforcement Learning
TLDR
This paper shows how to combine prior knowledge and evidence from the expert's actions to derive a probability distribution over the space of reward functions and presents efficient algorithms that find solutions for the reward learning and apprenticeship learning tasks that generalize well over these distributions.
Apprenticeship learning via inverse reinforcement learning
TLDR
This work thinks of the expert as trying to maximize a reward function that is expressible as a linear combination of known features, and gives an algorithm for learning the task demonstrated by the expert, based on using "inverse reinforcement learning" to try to recover the unknown reward function.
Active Learning for Reward Estimation in Inverse Reinforcement Learning
TLDR
An algorithm is proposed that allows the agent to query the demonstrator for samples at specific states, instead of relying only on samples provided at "arbitrary" states, to estimate the reward function with similar accuracy as other methods from the literature while reducing the amount of policy samples required from the expert.
Inverse Reinforcement Learning with PI 2
TLDR
An algorithm that recovers an unknown cost function from expert-Demonstrated trajectories in continuous space by enforcing the constraint that the expert-demonstrated trajectory does not change under the PI update rule, and hence is locally optimal.
Nonlinear Inverse Reinforcement Learning with Gaussian Processes
TLDR
A probabilistic algorithm that allows complex behaviors to be captured from suboptimal stochastic demonstrations, while automatically balancing the simplicity of the learned reward structure against its consistency with the observed actions.
Apprenticeship Learning using Inverse Reinforcement Learning and Gradient Methods
TLDR
A novel gradient algorithm to learn a policy from an expert's observed behavior assuming that the expert behaves optimally with respect to some unknown reward function of a Markovian Decision Problem is proposed.
Maximum margin planning
TLDR
This work learns mappings from features to cost so an optimal policy in an MDP with these cost mimics the expert's behavior, and demonstrates a simple, provably efficient approach to structured maximum margin learning, based on the subgradient method, that leverages existing fast algorithms for inference.
Supervised Probabilistic Robust Embedding with Sparse Noise
TLDR
This paper proposes a supervised probabilistic robust embedding (SPRE) model in which data are corrupted either by sparse noise or by a combination of Gaussian and sparse noises and devise a two-fold variational EM learning algorithm in which the update of model parameters has analytical solution.
Inverse reinforcement learning with evaluation
TLDR
A variant of IRL, which is called IRL with evaluation (IRLE) where instead of observing the desired agent behaviour, the relative evaluation between different behaviours is known by the access to an evaluator.
...
...