Corpus ID: 202676901

Value function estimation in Markov reward processes: Instance-dependent π“βˆž-bounds for policy evaluation

@article{Pananjady2019ValueFE,
  title={Value function estimation in Markov reward processes: Instance-dependent π“βˆž-bounds for policy evaluation},
  author={A. Pananjady and M. Wainwright},
  journal={ArXiv},
  year={2019},
  volume={abs/1909.08749}
}
Markov reward processes (MRPs) are used to model stochastic phenomena arising in operations research, control engineering, robotics, artificial intelligence, as well as communication and transportation networks. In many of these cases, such as in the policy evaluation problem encountered in reinforcement learning, the goal is to estimate the long-term value function of such a process without access to the underlying population transition and reward functions. Working with samples generated… Expand
Nonconvex Low-Rank Symmetric Tensor Completion from Noisy Data
...
1
2
...

References

SHOWING 1-10 OF 58 REFERENCES
Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms
Near-optimal PAC bounds for discounted MDPs
...
1
2
3
4
5
...