Learn More
We study a general online convex optimization problem. We have a convex set <i>S</i> and an unknown sequence of cost functions <i>c</i><inf>1</inf>, <i>c</i><inf>2</inf>,..., and in each period, we choose a feasible point <i>x<inf>t</inf></i> in <i>S</i>, and learn the cost <i>c<inf>t</inf></i>(<i>x<inf>t</inf></i>). If the function <i>c<inf>t</inf></i> is(More)
MDPs are an attractive formalization for planning, but realistic problems often have intractably large state spaces. When we only need a partial policy to get from a fixed start state to a goal, restricting computation to states relevant to this task can make much larger problems tractable. We introduce a new algorithm, Bounded RTDP, which can produce(More)
We investigate methods for planning in a Markov Decision Process where the cost function is chosen by an adversary after we fix our policy. As a running example, we consider a robot path planning problem where costs are influenced by sensors that an adversary places in the environment. We formulate the problem as a zero-sum matrix game where rows correspond(More)
We give an algorithm for the bandit version of a very general online optimization problem considered by Kalai and Vempala [1], for the case of an adaptive adversary. In this problem we are given a bounded set S ¢ ¤ £ n of feasible points. At each time step t, the online algorithm must select a point x t ¥ S while simultaneously an adversary selects a cost(More)
We introduce a new online convex optimization algorithm that adaptively chooses its regulariza-tion function based on the loss functions observed so far. This is in contrast to previous algorithms that use a fixed regularization function such as L2-squared, and modify it only via a single time-dependent parameter. Our algorithm's regret bounds are(More)
We prove that many mirror descent algorithms for online convex optimization (such as online gradient descent) have an equivalent interpretation as follow-the-regularized-leader (FTRL) algorithms. This observation makes the relationships between many commonly used algorithms explicit, and provides theoretical insight on previous experimental observations. In(More)
Convex games are a natural generalization of matrix (normal-form) games that can compactly model many strategic interactions with interesting structure. We present a new anytime algorithm for such games that leverages fast best-response oracles for both players to build a model of the overall game. This model is used to identify search directions; the(More)
We consider the problem of selecting actions in order to maximize rewards chosen by an adversary, where the set of actions available on any given round is selected stochas-tically. We present the first polynomial-time no-regret algorithm for this setting. In the full-observation (experts) version of the problem , we present an exponential-weights algorithm(More)
We study stochastic optimization problems when the data is sparse, which is in a sense dual to current perspectives on high-dimensional statistical learning and optimization. We highlight both the difficulties—in terms of increased sample complexity that sparse data necessitates—and the potential benefits, in terms of allowing parallelism and asynchrony in(More)