Learn More
It is shown here that stability of the stochastic approximation algorithm is implied by the asymptotic stability of the origin for an associated ODE. This in turn implies convergence of the algorithm. Several specific classes of algorithms are considered as applications. It is found that the results provide (i) a simpler derivation of known results for(More)
In Part I we developed stability concepts for discrete chains, together with Foster-Lyapunov criteria for them to hold. Part II was devoted to developing related stability concepts for continuous-time processes. In this paper we develop criteria for these forms of stability for continuous-parameter Markovian processes on general state spaces, based on(More)
We develop the use of piecewise linear test functions for the analysis of sta bility of multiclass queueing networks and their associated uid limit models It is found that if an associated LP admits a positive solution then a Lyapunov function exists This implies that the uid limit model is stable and hence that the network model is positive Harris(More)
The average cost optimal control problem is addressed for Markov decision processes with unbounded cost. It is found that the policy iteration algorithm generates a sequence of policies which are c-regular (a strong stability condition), where c is the cost function under consideration. This result only requires the existence of an initial c-regular policy(More)
This paper concerns the structure of capacity-achieving input distributions for stochastic channel models, and a renewed look at their computational aspects. The following conclusions are obtained under general assumptions on the channel statistics. i) The capacity-achieving input distribution is binary for low signal-to-noise ratio (SNR). The proof is(More)
This paper considers in parallel the scheduling problem for multi class queueing networks and optimization of Markov decision processes It is shown that the value iteration algorithm may perform poorly when the algo rithm is not initialized properly The most typical case where the initial value function is taken to be zero may be a particularly bad choice(More)
We address the problem of computing the optimal Q-function in Markov decision problems with infinite state-space. We analyze the convergence properties of several variations of <i>Q</i>-learning when combined with function approximation, extending the analysis of TD-learning in (Tsitsiklis &amp; Van Roy, 1996a) to stochastic control settings. We identify(More)
In this paper we consider a φ-irreducible continuous parameter Markov process Φ whose state space is a general topological space. The recurrence and Harris recurrence structure of Φ is developed in terms of generalized forms of resolvent chains, where we allow statemodulated resolvents and embedded chains with arbitrary sampling distributions. We show that(More)