Highly Influential

4 Excerpts

@inproceedings{Moallemi2003DistributedOI, title={Distributed Optimization in Adaptive Networks : Appendix}, author={Ciamac C. Moallemi}, year={2003} }

- Published 2003

1 Markov Decision Processes Consider a Markov chain (w(k), a(k)) defined for k = 0, 1,. .. and with w(k) ∈ W, a(k) in A, where W and A are finite sets representing the system state space and the action space, respectively. The transition probabilities are defined by the function P θ (w , a , w, a) = Pr w(k + 1) = w, a(k + 1) = a| w(k) = w , a(k) = a. Here, θ ∈ R N is a vector of policy parameters. We will make the following assumption regarding the dynamics. Assumption 1.1. For all θ, the… CONTINUE READING