• Publications
  • Influence
Safe Reinforcement Learning via Shielding
TLDR
A new approach to learn optimal policies while enforcing properties expressed in temporal logic by synthesizing a reactive system called a shield, which monitors the actions from the learner and corrects them only if the chosen action causes a violation of the specification.
Learning grounded finite-state representations from unstructured demonstrations
TLDR
A series of algorithms are presented that draw from recent advances in Bayesian non-parametric statistics and control theory to automatically detect and leverage repeated structure at multiple levels of abstraction in demonstration data, providing robust generalization and transfer in complex, multi-step robotic tasks.
Learning and generalization of complex tasks from unstructured demonstrations
TLDR
This work uses the Beta Process Autoregressive Hidden Markov Model and Dynamic Movement Primitives to learn and generalize a multi-step task on the PR2 mobile manipulator and to demonstrate the potential of this framework to learn a large library of skills over time.
Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations
TLDR
A novel reward-learning-from-observation algorithm, Trajectory-ranked Reward EXtrapolation (T-REX), that extrapolates beyond a set of (approximately) ranked demonstrations in order to infer high-quality reward functions from a setof potentially poor demonstrations.
Machine Teaching for Inverse Reinforcement Learning: Algorithms and Applications
TLDR
This work formalizes the problem of finding maximally informative demonstrations for IRL as a machine teaching problem where the goal is to find the minimum number of demonstrations needed to specify the reward equivalence class of the demonstrator, and proposes an efficient approximation algorithm for determining the set of maximally-informative demonstrations.
Using Natural Language for Reward Shaping in Reinforcement Learning
TLDR
This work proposes the LanguagE-Action Reward Network (LEARN), a framework that maps free-form natural language instructions to intermediate rewards based on actions taken by the agent that can seamlessly be integrated into any standard reinforcement learning algorithm.
Better-than-Demonstrator Imitation Learning via Automatically-Ranked Demonstrations
TLDR
D-REX is the first imitation learning approach to achieve significant extrapolation beyond the demonstrator's performance without additional side-information or supervision, such as rewards or human preferences, and shows that preference-based inverse reinforcement learning can be applied in traditional imitation learning settings where only unlabeled demonstrations are available.
Incremental Semantically Grounded Learning from Demonstration
TLDR
A novel method for discovering semantically grounded primitives and incrementally building and improving a finite-state representation of a task in which various contingencies can arise is introduced.
Active Reward Learning from Critiques
  • Yuchen Cui, S. Niekum
  • Computer Science
    IEEE International Conference on Robotics and…
  • 1 May 2018
TLDR
This work proposes a novel trajectory-based active Bayesian inverse reinforcement learning algorithm that queries the user for critiques of automatically generated trajectories, utilizes trajectory segmentation to expedite the critique / labeling process, and predicts the user's critiques to generate the most highly informative trajectory queries.
Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation
TLDR
Two bootstrapping off-policy evaluation methods which use learned MDP transition models in order to estimate lower confidence bounds on policy performance with limited data in both continuous and discrete state spaces are proposed.
...
1
2
3
4
5
...