• Publications
  • Influence
Learnability and the Vapnik-Chervonenkis dimension
TLDR
This paper shows that the essential condition for distribution-free learnability is finiteness of the Vapnik-Chervonenkis dimension, a simple combinatorial parameter of the class of concepts to be learned. Expand
The Weighted Majority Algorithm
TLDR
A simple and effective method, based on weighted voting, is introduced for constructing a compound algorithm, which is robust in the presence of errors in the data, and is called the Weighted Majority Algorithm. Expand
Exponentiated Gradient Versus Gradient Descent for Linear Predictors
TLDR
The bounds suggest that the losses of the algorithms are in general incomparable, but EG(+/-) has a much smaller loss if only a few components of the input are relevant for the predictions, which is quite tight already on simple artificial data. Expand
How to use expert advice
TLDR
This work analyzes algorithms that predict a binary value by combining the predictions of several prediction strategies, called `experts', and shows how this leads to certain kinds of pattern recognition/learning algorithms with performance bounds that improve on the best results currently known in this context. Expand
Relating Data Compression and Learnability
TLDR
It is demonstrated that the existence of a suitable data compression scheme is sufficient to ensure learnability and the introduced compression scheme provides a rigorous model for studying data compression in connection with machine learning. Expand
On-Line Portfolio Selection Using Multiplicative Updates
We present an on-line investment algorithm that achieves almost the same wealth as the best constant-rebalanced portfolio determined in hindsight from the actual market outcomes. The algorithmExpand
Tracking the Best Expert
TLDR
The generalization allows the sequence to be partitioned into segments and the goal is to bound the additional loss of the algorithm over the sum of the losses of the best experts of each segment to model situations in which the examples change and different experts are best for certain segments of the sequence of examples. Expand
Tracking the Best Expert
TLDR
The generalization allows the sequence to be partitioned into segments, and the goal is to bound the additional loss of the algorithm over the sum of the losses of the best experts for each segment to model situations in which the examples change and different experts are best for certain segments of the sequence of examples. Expand
Relative Loss Bounds for On-Line Density Estimation with the Exponential Family of Distributions
TLDR
This work considers on-line density estimation with a parameterized density from the exponential family and uses a Bregman divergence to derive and analyze each algorithm to design algorithms with the best possible relative loss bounds. Expand
Occam's Razor
Abstract We show that a polynomial learning algorithm, as defined by Valiant (1984), is obtained whenever there exists a polynomial-time method of producing, for any sequence of observations, aExpand
...
1
2
3
4
5
...