- John N. Tsitsiklis, Benjamin Van Roy
- NIPS
- 1996

We discuss the temporal-difference learning algorithm, as applied to approximating the cost-to-go function of an infinite-horizon discounted Markov chain. The algorithm we analyze updates parametersâ€¦ (More)

- Daniela Pucci de Farias, Benjamin Van Roy
- Operations Research
- 2003

The curse of dimensionality gives rise to prohibitive computational requirements that render infeasible the exact solution of large-scale stochastic control problems. We study an efficient methodâ€¦ (More)

Many real-world tasks require multiple decision makers (agents) to coordinate their actions in order to achieve common long-term goals. Examples include: manufacturing systems, where managers of aâ€¦ (More)

- Daniela Pucci de Farias, Benjamin Van Roy
- Math. Oper. Res.
- 2004

In the linear programming approach to approximate dynamic programming, one tries to solve a certain linear program â€” the ALP â€” which has a relatively small number K of variables but an intractableâ€¦ (More)

- John N. Tsitsiklis, Benjamin Van Roy
- IEEE Trans. Automat. Contr.
- 1999

We develop a theory characterizing optimal stopping times for discrete-time ergodic Markov processes with discounted rewards. The theory differs from prior work by its view of per-stage and terminalâ€¦ (More)

Efficient exploration in complex environments remains a major challenge for reinforcement learning. We propose bootstrapped DQN, a simple algorithm that explores in a computationally andâ€¦ (More)

- John N. Tsitsiklis, Benjamin Van Roy
- Machine Learning
- 1996

We develop a methodological framework and present a few different ways in which dynamic programming and compact representations can be combined to solve large scale stochastic control problems. Inâ€¦ (More)

- Ciamac C. Moallemi, Benjamin Van Roy
- IEEE Transactions on Information Theory
- 2005

We propose consensus propagation, an asynchronous distributed protocol for averaging numbers across a network. We establish convergence, characterize the convergence rate for regular graphs, andâ€¦ (More)

- John N. Tsitsiklis, Benjamin Van Roy
- IEEE Trans. Neural Networks
- 2001

We introduce and analyze a simulation-based approximate dynamic programming method for pricing complex American-style options, with a possibly high-dimensional underlying state space. We work withinâ€¦ (More)

- Ian Osband, Daniel Russo, Benjamin Van Roy
- NIPS
- 2013

Most provably-efficient reinforcement learning algorithms introduce optimism about poorly-understood states and actions to encourage exploration. We study an alternative approach for efficientâ€¦ (More)