Corpus ID: 237532555

Marginal MAP Estimation for Inverse RL under Occlusion with Observer Noise

@article{Suresh2021MarginalME,
  title={Marginal MAP Estimation for Inverse RL under Occlusion with Observer Noise},
  author={Prasanth Sengadu Suresh and Prashant Doshi},
  journal={ArXiv},
  year={2021},
  volume={abs/2109.07788}
}
We consider the problem of learning the behavioral preferences of an expert engaged in a task from noisy and partially-observable demonstrations. This is motivated by real-world applications such as a line robot learning from observing a human worker, where some observations are occluded by environmental objects that cannot be removed. Furthermore, robotic perception tends to be imperfect and noisy. Previous techniques for inverse reinforcement learning (IRL) take the approach of either… Expand

Figures and Tables from this paper

References

SHOWING 1-10 OF 24 REFERENCES
Scaling Expectation-Maximization for Inverse Reinforcement Learning to Multiple Robots under Occlusion
TLDR
This work explores various blocking schemes and presents methods for speeding up the computation of conditional expectations by employing blocked Gibbs sampling and demonstrates that these methods offer significantly improved performance over existing IRL techniques under occlusion. Expand
Expectation-Maximization for Inverse Reinforcement Learning with Hidden Data
TLDR
This work considers the problem of performing inverse reinforcement learning when the trajectory of the agent being observed is partially occluded from view and presents an algorithm based on expectation-maximization to solve the non-linear, non-convex problem. Expand
Inverse Reinforcement Learning Under Noisy Observations
TLDR
This work treats the expert's state and action as hidden data and presents an algorithm based on expectation maximization and maximum entropy principle to solve the non-linear, non-convex problem of inverse reinforcement learning. Expand
Inverse Reinforcement Learning in Partially Observable Environments
TLDR
This paper presents IRL algorithms for partially observable environments that can be modeled as a partially observable Markov decision process (POMDP) and deals with two cases according to the representation of the given expert's behavior. Expand
Robust Bayesian Inverse Reinforcement Learning with Sparse Behavior Noise
TLDR
This paper develops a robust IRL framework that can accurately estimate the reward function in the presence of behavior noise, and introduces a novel latent variable characterizing the reliability of each expert action and uses Laplace distribution as its prior. Expand
Multi-Robot Inverse Reinforcement Learning Under Occlusion with State Transition Estimation
TLDR
A crucial assumption in IRL is relaxed to make it better suited for wider robotic applications: the transition functions of other robots to be stochastic and do not assume that the transition error probabilities are known to the learner. Expand
Model-Free IRL Using Maximum Likelihood Estimation
TLDR
A model-free approach to IRL is presented, which casts IRL in the maximum likelihood framework and uses gradient ascent to update the feature weights to maximize the likelihood of expert’s trajectories. Expand
Inverse Reinforcement Learning with Missing Data
TLDR
Empirical evaluation on a real-world dataset shows that the proposed tractable approach to directly compute the log-likelihood of demonstrated trajectories with incomplete/missing data outperforms other conventional techniques. Expand
A Survey of Inverse Reinforcement Learning: Challenges, Methods and Progress
TLDR
The survey formally introduces the IRL problem along with its central challenges which include accurate inference, generalizability, correctness of prior knowledge, and growth in solution complexity with problem size and elaborates how the current methods mitigate these challenges. Expand
MAP Inference for Bayesian Inverse Reinforcement Learning
TLDR
This work addresses the difficulty in inverse reinforcement learning (IRL) by using the maximum a posteriori (MAP) estimation for the reward function, and shows that most of the previous IRL algorithms can be modeled into this framework. Expand
...
1
2
3
...