Learn More
It is shown here that stability of the stochastic approximation algorithm is implied by the asymptotic stability of the origin for an associated o.d.e. This in turn implies convergence of the algorithm. Several speciic classes of algorithms are considered as applications. It is found that the results provide (i) a simpler derivation of known results for(More)
—It is known that state-dependent, multi-step Lya-punov bounds lead to greatly simplified verification theorems for stability for large classes of Markov chain models. This is one component of the " fluid model " approach to stability of stochastic networks. In this paper we extend the general theory to random-ized multi-step Lyapunov theory to obtain(More)
We describe an exact dynamic programming update for constrained partially observable Markov decision processes (CPOMDPs). State-of-the-art exact solution of unconstrained POMDPs relies on implicit enumeration of the vectors in the piecewise linear value function, and pruning operations to obtain a minimal representation of the updated value function. In(More)
This paper concerns the structure of optimal codes for stochastic channel models. An investigation of an associated dual convex program reveals that the optimal distribution in channel coding is typically discrete. Based on this observation we obtain the following theoretical conclusions, as well as new algorithms for constructing capacity-achieving(More)
This paper establishes new criteria for stability and for instability of multiclass network models under a given stationary policy. It also extends previous results on the approximation of the solution to the average cost optimality equations through an associated fluid model: It is shown that an optimized network possesses a fluid limit model which is(More)
This paper considers in parallel the scheduling problem for multi-class queueing networks, and optimization of Markov decision processes. It is shown that the value iteration algorithm may perform poorly when the algorithm is not initialized properly. The most typical case where the initial value function is taken to be zero may be a particularly bad(More)
We address the problem of computing the optimal Q-function in Markov decision problems with infinite state-space. We analyze the convergence properties of several variations of <i>Q</i>-learning when combined with function approximation, extending the analysis of TD-learning in (Tsitsiklis &amp; Van Roy, 1996a) to stochastic control settings. We identify(More)
In Part I we developed stability concepts for discrete chains, together with Foster-Lyapunov criteria for them to hold. Part II was devoted to developing related stability concepts for continuous-time processes. In this paper we develop criteria for these forms of stability for continuous-parameter Markovian processes on general state spaces, based on(More)