Corpus ID: 212725139

DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction

@article{Kumar2020DisCorCF,
  title={DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction},
  author={Aviral Kumar and Abhishek Gupta and Sergey Levine},
  journal={ArXiv},
  year={2020},
  volume={abs/2003.07305}
}
  • Aviral Kumar, Abhishek Gupta, Sergey Levine
  • Published in ArXiv 2020
  • Mathematics, Computer Science
  • Deep reinforcement learning can learn effective policies for a wide range of tasks, but is notoriously difficult to use due to instability and sensitivity to hyperparameters. The reasons for this remain unclear. When using standard supervised methods (e.g., for bandits), on-policy data collection provides "hard negatives" that correct the model in precisely those states and actions that the policy is likely to visit. We call this phenomenon "corrective feedback." We show that bootstrapping… CONTINUE READING

    Citations

    Publications citing this paper.

    References

    Publications referenced by this paper.
    SHOWING 1-10 OF 58 REFERENCES

    Diagnosing Bottlenecks in Deep Q-learning Algorithms

    VIEW 9 EXCERPTS
    HIGHLY INFLUENTIAL

    Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

    VIEW 9 EXCERPTS
    HIGHLY INFLUENTIAL

    Prioritized Experience Replay

    VIEW 5 EXCERPTS
    HIGHLY INFLUENTIAL

    Human-level control through deep reinforcement learning

    VIEW 11 EXCERPTS
    HIGHLY INFLUENTIAL

    Error Bounds for Approximate Value Iteration

    VIEW 19 EXCERPTS
    HIGHLY INFLUENTIAL