Corpus ID: 209531816

The Gambler's Problem and Beyond

@article{Wang2020TheGP,
  title={The Gambler's Problem and Beyond},
  author={Baoxiang Wang and Shuai Li and Jiajin Li and Siu On Chan},
  journal={ArXiv},
  year={2020},
  volume={abs/2001.00102}
}
We analyze the Gambler's problem, a simple reinforcement learning problem where the gambler has the chance to double or lose their bets until the target is reached. This is an early example introduced in the reinforcement learning textbook by \cite{sutton2018reinforcement}, where they mention an interesting pattern of the optimal value function with high-frequency components and repeating non-smooth points but without further investigation. We provide the exact formula for the optimal value… Expand

References

SHOWING 1-10 OF 39 REFERENCES
Reinforcement Learning: An Introduction
TLDR
This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications. Expand
THE AXIOM OF CHOICE
We propose that failures of the axiom of choice, that is, surjective functions admitting no sections, can be reasonably classified by means of invariants borrowed from algebraic topology. We showExpand
Sur l'équation fonctionnelle f(x+y)=f(x)+f(y)
Sur l'équation fonctionnelle f(x+y)=f(x)+f(y)
Sur les fonctions convexes mesurables
Algorithms for Reinforcement Learning
TLDR
This book focuses on those algorithms of reinforcement learning that build on the powerful theory of dynamic programming, and gives a fairly comprehensive catalog of learning problems, and describes the core ideas, followed by the discussion of their theoretical properties and limitations. Expand
The Cantor function
This is an attempt to give a systematic survey of properties of the famous Cantor ternary function. 2005 Elsevier GmbH. All rights reserved. MSC 2000: primary 26-02; secondary 26A30
Hill Climbing on Value Estimates for Search-control in Dyna
TLDR
This work proposes to generate states by using the trajectory obtained from Hill Climbing the current estimate of the value function, and finds that there appears to be a benefit specifically from using the samples generated by climbing on current value estimates from low-value to high-value region. Expand
Python implementation of Reinforcement learning: An introduction, 2019
  • 2019
URL https:// youtu.be/aFXdpCDAG2g?t=395. The plot of the empirical optimal value function of the Mountain Car problem first appears at 6:35. Some follow-up discussions start at
  • 2019
...
1
2
3
4
...