#### Filter Results:

- Full text PDF available (39)

#### Publication Year

2003

2017

#### Publication Type

#### Co-author

#### Publication Venue

#### Data Set Used

#### Key Phrases

Learn More

We study a general online convex optimization problem. We have a convex set <i>S</i> and an unknown sequence of cost functions <i>c</i><inf>1</inf>, <i>c</i><inf>2</inf>,..., and in each period, we choose a feasible point <i>x<inf>t</inf></i> in <i>S</i>, and learn the cost <i>c<inf>t</inf></i>(<i>x<inf>t</inf></i>). If the function <i>c<inf>t</inf></i> is… (More)

In many applications, one has to actively select among a set of expensive observations before making an informed decision. For example, in environmental monitoring, we want to select locations to measure in order to most effectively predict spatial phenomena. Often, we want to select observations which are robust against a number of possible objective… (More)

MDPs are an attractive formalization for planning, but realistic problems often have intractably large state spaces. When we only need a partial policy to get from a fixed start state to a goal, restricting computation to states relevant to this task can make much larger problems tractable. We introduce a new algorithm, Bounded RTDP, which can produce… (More)

- H. Brendan McMahan, Avrim Blum
- COLT
- 2004

We give an algorithm for the bandit version of a very general online optimization problem considered by Kalai and Vempala [1], for the case of an adaptive adversary. In this problem we are given a bounded set S ¢ ¤ £ n of feasible points. At each time step t, the online algorithm must select a point x t ¥ S while simultaneously an adversary selects a cost… (More)

- H. Brendan McMahan, Geoffrey J. Gordon, Avrim Blum
- ICML
- 2003

We investigate methods for planning in a Markov Decision Process where the cost function is chosen by an adversary after we fix our policy. As a running example, we consider a robot path planning problem where costs are influenced by sensors that an adversary places in the environment. We formulate the problem as a zero-sum matrix game where rows correspond… (More)

- H. Brendan McMahan
- AISTATS
- 2011

We prove that many mirror descent algorithms for online convex optimization (such as online gradient descent) have an equivalent interpretation as follow-the-regularized-leader (FTRL) algorithms. This observation makes the relationships between many commonly used algorithms explicit, and provides theoretical insight on previous experimental observations. In… (More)

- Martín Abadi, Andy Chu, +4 authors Li Zhang
- ACM Conference on Computer and Communications…
- 2016

Machine learning techniques based on neural networks are achieving remarkable results in a wide variety of domains. Often, the training of models requires large, representative datasets, which may be crowdsourced and contain sensitive information. The models should not expose private information in these datasets. Addressing this goal, we develop new… (More)

- H. Brendan McMahan, Geoffrey J. Gordon
- ICAPS
- 2005

We study the problem of computing the optimal value function for a Markov decision process with positive costs. Computing this function quickly and accurately is a basic step in many schemes for deciding how to act in stochastic environments. There are efficient algorithms which compute value functions for special types of MDPs: for deterministic MDPs with… (More)

- H. Brendan McMahan, Geoffrey J. Gordon
- AISTATS
- 2007

Convex games are a natural generalization of matrix (normal-form) games that can compactly model many strategic interactions with interesting structure. We present a new anytime algorithm for such games that leverages fast best-response oracles for both players to build a model of the overall game. This model is used to identify search directions; the… (More)

- Varun Kanade, H. Brendan McMahan, Brent Bryan
- AISTATS
- 2009

We consider the problem of selecting actions in order to maximize rewards chosen by an adversary, where the set of actions available on any given round is selected stochas-tically. We present the first polynomial-time no-regret algorithm for this setting. In the full-observation (experts) version of the problem , we present an exponential-weights algorithm… (More)