Corpus ID: 140101018

Delegative Reinforcement Learning: learning to avoid traps with a little help

@article{Kosoy2019DelegativeRL,
  title={Delegative Reinforcement Learning: learning to avoid traps with a little help},
  author={Vanessa Kosoy},
  journal={ArXiv},
  year={2019},
  volume={abs/1907.08461}
}
  • Vanessa Kosoy
  • Published in ArXiv 2019
  • Mathematics, Computer Science
  • Most known regret bounds for reinforcement learning are either episodic or assume an environment without traps. We derive a regret bound without making either assumption, by allowing the algorithm to occasionally delegate an action to an external advisor. We thus arrive at a setting of active one-shot model-based reinforcement learning that we call DRL (delegative reinforcement learning.) The algorithm we construct in order to demonstrate the regret bound is a variant of Posterior Sampling… CONTINUE READING

    Create an AI-powered research feed to stay up to date with new papers like this posted to ArXiv

    Citations

    Publications citing this paper.
    SHOWING 1-2 OF 2 CITATIONS

    References

    Publications referenced by this paper.