Learn More
We propose and analyze a class of actor-critic algorithms for simulation-based optimization of a Markov decision process over a parameterized family of randomized stationary policies. These are two-timescale algorithms in which the critic uses TD learning with a linear approximation architecture and the actor is updated in an approximate gradient direction(More)
We explore the properties of a congestion game where users of a congested resource anticipate the effect of their actions on the price of the resource. When users are sharing a single resource, we establish that the aggregate utility received by the users is at least 3/4 of the maximum possible aggregate utility. We also consider extensions to a network(More)