• Publications
  • Influence
Safe Reinforcement Learning via Shielding
TLDR
We introduce a new approach to learn optimal policies while enforcing properties expressed in temporal logic. Expand
  • 125
  • 12
  • PDF
Learning and generalization of complex tasks from unstructured demonstrations
TLDR
We present a novel method for segmenting demonstrations, recognizing repeated skills, and generalizing complex tasks from unstructured demonstrations that combines many of the advantages of recent automatic segmentation methods for learning from demonstration into a single principled, integrated framework. Expand
  • 150
  • 7
  • PDF
Learning grounded finite-state representations from unstructured demonstrations
TLDR
Robots exhibit flexible behavior largely in proportion to their degree of knowledge about the world. Expand
  • 149
  • 7
  • PDF
Incremental Semantically Grounded Learning from Demonstration
TLDR
We introduce a novel method for discovering semantically grounded primitives and incrementally building and improving a finite-state representation of a task in which various contingencies can arise. Expand
  • 97
  • 6
  • PDF
Using Natural Language for Reward Shaping in Reinforcement Learning
TLDR
We propose the LanguagE-Action Reward Network (LEARN), a framework that maps free-form natural language instructions to intermediate rewards based on actions taken by the agent. Expand
  • 28
  • 5
  • PDF
Active Reward Learning from Critiques
  • Yuchen Cui, S. Niekum
  • Computer Science
  • IEEE International Conference on Robotics and…
  • 1 May 2018
TLDR
We propose a novel trajectory-based active Bayesian inverse reinforcement learning algorithm that queries the user for critiques of automatically generated trajectories, rather than asking for demonstrations or action labels at states with high expected information gain. Expand
  • 24
  • 4
  • PDF
Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation
TLDR
We propose two bootstrapping off-policy evaluation methods which use learned MDP transition models in order to estimate lower confidence bounds on policy performance with limited data in both continuous and discrete state spaces. Expand
  • 34
  • 3
  • PDF
Better-than-Demonstrator Imitation Learning via Automatically-Ranked Demonstrations
TLDR
We introduce Disturbance-based Reward Extrapolation (D-REX), a ranking-based imitation learning method that injects noise into a policy learned through behavioral cloning to automatically generate ranked demonstrations. Expand
  • 8
  • 3
  • PDF
Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations
TLDR
In this paper, we introduce a novel reward-learning-from-observation algorithm, Trajectory-ranked Reward EXtrapolation (T-REX), that extrapolates beyond a set of (approximately) ranked demonstrations in order to infer high-quality reward functions from aSet of potentially poor demonstrations. Expand
  • 51
  • 2
  • PDF
Genetic Programming for Reward Function Search
TLDR
This paper presents a genetic programming algorithm to search for alternate reward functions for RL problems and describe classes of problems where it might be particularly useful. Expand
  • 45
  • 2
  • PDF