#### Filter Results:

- Full text PDF available (107)

#### Publication Year

1988

2017

#### Publication Type

#### Co-author

#### Publication Venue

#### Data Set Used

#### Key Phrases

Learn More

- Peter Auer, Nicolò Cesa-Bianchi, Paul Fischer
- Machine Learning
- 2002

Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the search for a balance between exploring the environment to find profitable actions while taking the empirically best action as often as possible. A popular measure of a policy's success in addressing this dilemma is the regret, that is the loss due to the fact that the… (More)

Cet ouvrage dresse l'´ etat de l'art dans un domaine de recherche en pleine expansion , ` a la croisée des chemins de la théorie de l'apprentissage, de la statistique, de la théorie des jeux et de celle de l'information ; il présenté egalement quelques points de vue ou résultats nouveaux. Le spectre des applications considérées (ou des applications… (More)

- Peter Auer, Nicolò Cesa-Bianchi, Yoav Freund, Robert E. Schapire
- SIAM J. Comput.
- 2002

In the multiarmed bandit problem, a gambler must decide which arm of K non-identical slot machines to play in a sequence of trials so as to maximize his reward. This classical problem has received much attention because of the simple model it provides of the trade-off between exploration (trying out each arm to find the best one) and exploitation (playing… (More)

- Peter Auer, Nicolò Cesa-Bianchi, Yoav Freund, Robert E. Schapire
- Electronic Colloquium on Computational Complexity
- 1995

In the multi-armed bandit problem, a gambler must decide which arm of K non-identical slot machines to play in a sequence of trials so as to maximize his reward. This classical problem has received much attention because of the simple model it provides of the trade-off between exploration (trying out each arm to find the best one) and exploitation (playing… (More)

- Noga Alon, Shai Ben-David, Nicolò Cesa-Bianchi, David Haussler
- J. ACM
- 1993

Learnability in Valiant's PAC learning model has been shown to be strongly related to the existence of uniform laws of large numbers. These laws define a distribution-free convergence property of means to expectations uniformly over classes of random variables. Classes of real-valued functions enjoying such a property are also known as uniform… (More)

- Sébastien Bubeck, Nicolò Cesa-Bianchi
- Foundations and Trends in Machine Learning
- 2012

Multi-armed bandit problems are the most basic examples of sequential decision problems with an exploration–exploitation trade-off. This is the balance between staying with the option that gave highest payoffs in the past and exploring new options that might give higher payoffs in the future. Although the study of bandit problems dates back to the 1930s,… (More)

We analyze algorithms that predict a binary value by combining the predictions of several prediction strategies, called <italic>experts</italic>. Our analysis is for worst-case situations, i.e., we make no assumptions about the way the sequence of bits to be predicted is generated. We measure the performance of the algorithm by the difference between the… (More)

We study the problem of hierarchical classification when labels corresponding to partial and/or multiple paths in the underlying taxonomy are allowed. We introduce a new hierarchical loss function, the H-loss, implementing the simple intuition that additional mistakes in the subtree of a mistaken class should not be charged for. Based on a probabilistic… (More)

- Peter Auer, Nicolò Cesa-Bianchi, Claudio Gentile
- J. Comput. Syst. Sci.
- 2000

Most of the performance bounds for on-line learning algorithms are proven assuming a constant learning rate. To optimize these bounds, the learning rate must be tuned based on quantities that are generally unknown, as they depend on the whole sequence of examples. In this paper we show that essentially the same optimized bounds can be obtained when the… (More)

- Nicolò Cesa-Bianchi, Alex Conconi, Claudio Gentile
- IEEE Transactions on Information Theory
- 2001

In this paper, it is shown how to extract a hypothesis with small risk from the ensemble of hypotheses generated by an arbitrary on-line learning algorithm run on an independent and identically distributed (i.i.d.) sample of data. Using a simple large deviation argument, we prove tight data-dependent bounds for the risk of this hypothesis in terms of an… (More)