Learn More
Partially observable Markov decision processes (POMDPs) are interesting because they provide a general framework for learning in the presence of multiple forms of uncertainty. We survey methods for learning within the POMDP framework. Because exact methods are intractable we concentrate on approximate methods. We explore two versions of the POMDP training(More)
The Priority Inbox feature of Gmail ranks mail by the probability that the user will perform an action on that mail. Because " importance " is highly personal, we try to predict it by learning a per-user statistical model, updated as frequently as possible. This research note describes the challenges of online learning over millions of models, and the(More)
Acknowledgements Academic Primary thanks go to Jonathan Baxter, my main advisor, who kept up his supervision despite going to work in the " real world. " The remainder of my panel was Sylvie Thiébaux, Peter Bartlett, and Bruce Millar, all of whom gave invaluable advice. Thanks also to Bob Edwards for constructing the " Bunyip " Linux cluster and(More)
Reinforcement learning by direct policy gradient estimation is attractive in theory but in practice leads to notoriously ill-behaved optimization problems. We improve its robustness and speed of convergence with stochastic meta-descent, a gain vector adaptation method that employs fast Hessian-vector products. In our experiments the resulting algorithms(More)
Current road-traffic optimisation practice around the world is a combination of hand tuned policies with a small degree of automatic adaption. Even state-of-the-art research controllers need good models of the road traffic, which cannot be obtained directly from existing sensors. We use a policy-gradient reinforcement learning approach to directly optimise(More)
Military operations planning involves concurrent actions, resource assignment, and conflicting costs. Individual tasks sometimes fail with a known probability, promoting a decision-theoretic approach. The planner must choose between multiple tasks that achieve similar outcomes but have different costs. The military domain is particularly suited to automated(More)