Corpus ID: 227248002

DERAIL: Diagnostic Environments for Reward And Imitation Learning

  title={DERAIL: Diagnostic Environments for Reward And Imitation Learning},
  author={P. Freire and Adam Gleave and S. Toyer and Stuart Russell},
  • P. Freire, Adam Gleave, +1 author Stuart Russell
  • Published 2020
  • Computer Science
  • ArXiv
  • The objective of many real-world tasks is complex and difficult to procedurally specify. This makes it necessary to use reward or imitation learning algorithms to infer a reward or policy directly from human data. Existing benchmarks for these algorithms focus on realism, testing in complex environments. Unfortunately, these benchmarks are slow, unreliable and cannot isolate failures. As a complementary approach, we develop a suite of simple diagnostic tasks that test individual facets of… CONTINUE READING


    Deep Reinforcement Learning from Human Preferences
    • 352
    • Highly Influential
    • PDF
    SQIL: Imitation Learning via Reinforcement Learning with Sparse Rewards
    • 26
    Behaviour Suite for Reinforcement Learning
    • 51
    • PDF
    Generative Adversarial Imitation Learning
    • 969
    • Highly Influential
    • PDF
    Learning Robust Rewards with Adversarial Inverse Reinforcement Learning
    • 233
    • Highly Influential
    • PDF
    Exploration by Random Network Distillation
    • 301
    • PDF
    A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning
    • 1,357
    • PDF
    Deep Reinforcement Learning that Matters
    • 744
    • PDF
    Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control
    • 128
    • PDF
    Simitate: A Hybrid Imitation Learning Benchmark
    • 8
    • PDF