#### Filter Results:

- Full text PDF available (83)

#### Publication Year

1981

2017

- This year (1)
- Last 5 years (17)
- Last 10 years (50)

#### Publication Type

#### Co-author

#### Journals and Conferences

#### Data Set Used

#### Key Phrases

Learn More

- Peter Auer, Nicolò Cesa-Bianchi, Paul Fischer
- Machine Learning
- 2002

Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the search for a balance between exploring the environment to find profitable actions while taking the empirically best action as often as possible. A popular measure of a policy's success in addressing this dilemma is the regret, that is the loss due to the fact that the… (More)

- Peter Auer, Nicolò Cesa-Bianchi, Yoav Freund, Robert E. Schapire
- SIAM J. Comput.
- 2002

In the multiarmed bandit problem, a gambler must decide which arm of K nonidentical slot machines to play in a sequence of trials so as to maximize his reward. This classical problem has received much attention because of the simple model it provides of the trade-off between exploration (trying out each arm to find the best one) and exploitation (playing… (More)

- Peter Auer, Nicolò Cesa-Bianchi, Yoav Freund, Robert E. Schapire
- Electronic Colloquium on Computational Complexity
- 1995

In the multi-armed bandit problem, a gambler must decide which arm of K non-identical slot machines to play in a sequence of trials so as to maximize his reward. This classical problem has received much attention because of the simple model it provides of the trade-off between exploration (trying out each arm to find the best one) and exploitation (playing… (More)

- Peter Auer
- Journal of Machine Learning Research
- 2002

We show how a standard tool from statistics — namely confidence bounds — can be used to elegantly deal with situations which exhibit an exploitation-exploration trade-off. Our technique for designing and analyzing algorithms for such situations is general and can be applied when an algorithm has to make exploitation-versus-exploration decisions based on… (More)

- Andreas Opelt, Axel Pinz, Michael Fussenegger, Peter Auer
- IEEE Transactions on Pattern Analysis and Machine…
- 2006

This paper explores the power and the limitations of weakly supervised categorization. We present a complete framework that starts with the extraction of various local regions of either discontinuity or homogeneity. A variety of local descriptors can be applied to form a set of feature vectors for each local region. Boosting is used to learn a subset of… (More)

- Andreas Opelt, Michael Fussenegger, Axel Pinz, Peter Auer
- ECCV
- 2004

In this paper we describe the first stage of a new learning system for object detection and recognition. For our system we propose Boosting [5] as the underlying learning technique. This allows the use of very diverse sets of visual features in the learning process within a common framework: Boosting — together with a weak hypotheses finder — may choose… (More)

- Peter Auer, Thomas Jaksch, Ronald Ortner
- NIPS
- 2008

For undiscounted reinforcement learning in Markov decision processes (MDPs) we consider the total regret of a learning algorithm with respect to an optimal policy. In order to describe the transition structure of an MDP we propose a new parameter: An MDP has diameter D if for any pair of states s, s′ there is a policy which moves from s to s′ in at most D… (More)

- Peter Auer, Nicolò Cesa-Bianchi, Claudio Gentile
- J. Comput. Syst. Sci.
- 2000

Most of the performance bounds for on-line learning algorithms are proven assuming a constant learning rate. To optimize these bounds, the learning rate must be tuned based on quantities that are generally unknown, as they depend on the whole sequence of examples. In this paper we show that essentially the same optimized bounds can be obtained when the… (More)

We consider the problem of selecting, from among the arms of a stochastic n-armed bandit, a subset of size m of those arms with the highest expected rewards, based on efficiently sampling the arms. This “subset selection” problem finds application in a variety of areas. In the authors’ previous work (Kalyanakrishnan & Stone, 2010), this problem is framed… (More)

- Peter Auer, Ronald Ortner
- Periodica Mathematica Hungarica
- 2010

ABSTRACT. In the stochastic multi-armed bandit problem we consider a modification of the UCB algorithm of Auer et al. [4]. For this modified algorithm we give an improved bound on the regret with respect to the optimal reward. While for the original UCB algorithm the regret in Karmed bandits after T trials is bounded by const · K log(T ) , where measures… (More)