Learn More
We study a general online convex optimization problem. We have a convex set <i>S</i> and an unknown sequence of cost functions <i>c</i><inf>1</inf>, <i>c</i><inf>2</inf>,..., and in each period, we choose a feasible point <i>x<inf>t</inf></i> in <i>S</i>, and learn the cost <i>c<inf>t</inf></i>(<i>x<inf>t</inf></i>). If the function <i>c<inf>t</inf></i> is(More)
MDPs are an attractive formalization for planning, but realistic problems often have intractably large state spaces. When we only need a partial policy to get from a fixed start state to a goal, restricting computation to states relevant to this task can make much larger problems tractable. We introduce a new algorithm, Bounded RTDP, which can produce(More)
We give an algorithm for the bandit version of a very general online optimization problem considered by Kalai and Vempala [1], for the case of an adaptive adversary. In this problem we are given a bounded set S ¢ ¤ £ n of feasible points. At each time step t, the online algorithm must select a point x t ¥ S while simultaneously an adversary selects a cost(More)
We investigate methods for planning in a Markov Decision Process where the cost function is chosen by an adversary after we fix our policy. As a running example, we consider a robot path planning problem where costs are influenced by sensors that an adversary places in the environment. We formulate the problem as a zero-sum matrix game where rows correspond(More)
We prove that many mirror descent algorithms for online convex optimization (such as online gradient descent) have an equivalent interpretation as follow-the-regularized-leader (FTRL) algorithms. This observation makes the relationships between many commonly used algorithms explicit, and provides theoretical insight on previous experimental observations. In(More)
We introduce a new online convex optimization algorithm that adaptively chooses its regulariza-tion function based on the loss functions observed so far. This is in contrast to previous algorithms that use a fixed regularization function such as L2-squared, and modify it only via a single time-dependent parameter. Our algorithm's regret bounds are(More)
We consider the problem of selecting actions in order to maximize rewards chosen by an adversary, where the set of actions available on any given round is selected stochas-tically. We present the first polynomial-time no-regret algorithm for this setting. In the full-observation (experts) version of the problem , we present an exponential-weights algorithm(More)
Convex games are a natural generalization of matrix (normal-form) games that can compactly model many strategic interactions with interesting structure. We present a new anytime algorithm for such games that leverages fast best-response oracles for both players to build a model of the overall game. This model is used to identify search directions; the(More)