# The Gambler's Problem and Beyond

@article{Wang2020TheGP, title={The Gambler's Problem and Beyond}, author={Baoxiang Wang and Shuai Li and Jiajin Li and Siu On Chan}, journal={ArXiv}, year={2020}, volume={abs/2001.00102} }

We analyze the Gambler's problem, a simple reinforcement learning problem where the gambler has the chance to double or lose their bets until the target is reached. This is an early example introduced in the reinforcement learning textbook by \cite{sutton2018reinforcement}, where they mention an interesting pattern of the optimal value function with high-frequency components and repeating non-smooth points but without further investigation. We provide the exact formula for the optimal value… Expand

#### References

SHOWING 1-10 OF 39 REFERENCES

Reinforcement Learning: An Introduction

- Computer Science
- IEEE Transactions on Neural Networks
- 2005

This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications. Expand

THE AXIOM OF CHOICE

- 2003

We propose that failures of the axiom of choice, that is, surjective functions admitting no sections, can be reasonably classified by means of invariants borrowed from algebraic topology. We show… Expand

Algorithms for Reinforcement Learning

- Computer Science
- Algorithms for Reinforcement Learning
- 2010

This book focuses on those algorithms of reinforcement learning that build on the powerful theory of dynamic programming, and gives a fairly comprehensive catalog of learning problems, and describes the core ideas, followed by the discussion of their theoretical properties and limitations. Expand

The Cantor function

- 2005

This is an attempt to give a systematic survey of properties of the famous Cantor ternary function. 2005 Elsevier GmbH. All rights reserved. MSC 2000: primary 26-02; secondary 26A30

Hill Climbing on Value Estimates for Search-control in Dyna

- Computer Science, Mathematics
- IJCAI
- 2019

This work proposes to generate states by using the trajectory obtained from Hill Climbing the current estimate of the value function, and finds that there appears to be a benefit specifically from using the samples generated by climbing on current value estimates from low-value to high-value region. Expand

Python implementation of Reinforcement learning: An introduction, 2019

- 2019

URL https:// youtu.be/aFXdpCDAG2g?t=395. The plot of the empirical optimal value function of the Mountain Car problem first appears at 6:35. Some follow-up discussions start at

- 2019