Learn More
This work is a survey of the average cost control problem for discrete-time Markov processes. The authors have attempted to put together a comprehensive account of the considerable research on this problem over the past three decades. The exposition ranges from finite to Borel state and action spaces and includes a variety of methodologies to find and(More)
Based on recent results for multiarmed bandit problems, we propose an adaptive sampling algorithm that approximates the optimal value of a finite-horizon Markov decision process (MDP) with finite state and action spaces. The algorithm adaptively chooses which action to sample as the sampling process proceeds and generates an asymptotically unbiased(More)
— This paper proposes a simple analytical model called timescale Markov Decision Process (MMDP) for hierarchically struc-tured sequential decision making processes, where decisions in each level in the-level hierarchy are made in different discrete timescales. In this model, the state space and the control space of each level in the hierarchy are(More)
We propose a time aggregation approach for the solution of inÿnite horizon average cost Markov decision processes via policy iteration. In this approach, policy update is only carried out when the process visits a subset of the state space. As in state aggregation, this approach leads to a reduced state space, which may lead to a substantial reduction in(More)
" Finite-dimensional regulators for a class of infinite-dimensional systems, " Syst. [13] Q. Vu, " The operator equation AX 0 XB = C with unbounded operators A and B and related abstract Cauchy problems, " Mathematische Abstract—We propose a novel algorithm called evolutionary policy iteration (EPI) for solving infinite horizon discounted reward Markov(More)
We introduce a new randomized method called Model Reference Adaptive Search (MRAS) for solving global optimization problems. The method works with a parameterized probabilistic model on the solution space and generates at each iteration a group of candidate solutions. These candidate solutions are then used to update the parameters associated with the(More)
A protocol mismatch occurs when heterogeneous networks try to communicate with each other. Such mismatches are inevitable due to the proliferation of a multitude of networking architectures, hardware, and software on one hand, and the need for global connectivity on the other hand. In order to circumvent this problem the solution of protocol conversion has(More)
In this paper, we consider Simultaneous Perturbation Stochastic Approximation (SPSA) for function minimization. The standard assumption for convergence is that the function be three times differentiable, although weaker assumptions have been used for special cases. However, all work that we are aware of at least requires differentiability. In this paper, we(More)