#### Filter Results:

#### Publication Year

2003

2016

#### Publication Type

#### Co-author

#### Key Phrase

#### Publication Venue

Learn More

We study a general online convex optimization problem. We have a convex set <i>S</i> and an unknown sequence of cost functions <i>c</i><inf>1</inf>, <i>c</i><inf>2</inf>,..., and in each period, we choose a feasible point <i>x<inf>t</inf></i> in <i>S</i>, and learn the cost <i>c<inf>t</inf></i>(<i>x<inf>t</inf></i>). If the function <i>c<inf>t</inf></i> is… (More)

MDPs are an attractive formalization for planning, but realistic problems often have intractably large state spaces. When we only need a partial policy to get from a fixed start state to a goal, restricting computation to states relevant to this task can make much larger problems tractable. We introduce a new algorithm, Bounded RTDP, which can produce… (More)

We give an algorithm for the bandit version of a very general online optimization problem considered by Kalai and Vempala [1], for the case of an adaptive adversary. In this problem we are given a bounded set S ¢ ¤ £ n of feasible points. At each time step t, the online algorithm must select a point x t ¥ S while simultaneously an adversary selects a cost… (More)

We investigate methods for planning in a Markov Decision Process where the cost function is chosen by an adversary after we fix our policy. As a running example, we consider a robot path planning problem where costs are influenced by sensors that an adversary places in the environment. We formulate the problem as a zero-sum matrix game where rows correspond… (More)

We prove that many mirror descent algorithms for online convex optimization (such as online gradient descent) have an equivalent interpretation as follow-the-regularized-leader (FTRL) algorithms. This observation makes the relationships between many commonly used algorithms explicit, and provides theoretical insight on previous experimental observations. In… (More)

We introduce a new online convex optimization algorithm that adaptively chooses its regulariza-tion function based on the loss functions observed so far. This is in contrast to previous algorithms that use a fixed regularization function such as L2-squared, and modify it only via a single time-dependent parameter. Our algorithm's regret bounds are… (More)

We consider the problem of selecting actions in order to maximize rewards chosen by an adversary, where the set of actions available on any given round is selected stochas-tically. We present the first polynomial-time no-regret algorithm for this setting. In the full-observation (experts) version of the problem , we present an exponential-weights algorithm… (More)

- Andreas Krause, Anupam Gupta, Tom Mitchell, Tommi Jaakkola, Carlos Guestrin, Jure Leskovec +17 others
- 2008

We study the problem of computing the optimal value function for a Markov decision process with positive costs. Computing this function quickly and accurately is a basic step in many schemes for deciding how to act in stochastic environments. There are efficient algorithms which compute value functions for special types of MDPs: for deterministic MDPs with… (More)

Convex games are a natural generalization of matrix (normal-form) games that can compactly model many strategic interactions with interesting structure. We present a new anytime algorithm for such games that leverages fast best-response oracles for both players to build a model of the overall game. This model is used to identify search directions; the… (More)