#### Filter Results:

- Full text PDF available (162)

#### Publication Year

1984

2017

- This year (5)
- Last 5 years (24)
- Last 10 years (57)

#### Publication Type

#### Co-author

#### Journals and Conferences

#### Data Set Used

#### Key Phrases

Learn More

- Anselm Blumer, Andrzej Ehrenfeucht, David Haussler, Manfred K. Warmuth
- J. ACM
- 1989

Valiant's learnability model is extended to learning classes of concepts defined by regions in Euclidean space <italic>E<supscrpt>n</supscrpt></italic>. The methods in this paper lead to a unified treatment of some of Valiant's results, along with previous results on distribution-free convergence of certain pattern recognition algorithms. It is shown that… (More)

- Nick Littlestone, Manfred K. Warmuth
- Inf. Comput.
- 1989

We study the construction of prediction algorithms in a situation in which a learner faces a sequence of trials with a prediction to be made in each and the goal of the learner is to make few mistakes We are interested in the case that the learner has reason to believe that one of some pool of known algorithms will perform well but the learner does not know… (More)

- Mark Herbster, Manfred K. Warmuth
- Machine Learning
- 1995

We generalize the recent relative loss bounds for on-line algorithms where the additional loss of the algorithm on the whole sequence of examples over the loss of the best expert is bounded. The generalization allows the sequence to be partitioned into segments, and the goal is to bound the additional loss of the algorithm over the sum of the losses of the… (More)

- Jyrki Kivinen, Manfred K. Warmuth
- Inf. Comput.
- 1997

We consider two algorithms for on-line prediction based on a linear model. The algorithms are the well-known gradient descent (GD) algorithm and a new algorithm, which we call EG. They both maintain a weight vector using simple updates. For the GD algorithm, the update is based on subtracting the gradient of the squared error made on a prediction. The EG… (More)

We analyze algorithms that predict a binary value by combining the predictions of several prediction strategies, called <italic>experts</italic>. Our analysis is for worst-case situations, i.e., we make no assumptions about the way the sequence of bits to be predicted is generated. We measure the performance of the algorithm by the difference between the… (More)

We explore the learnability of two-valued functions from samples using the paradigm of Data Compression. A first algorithm (compression) choses a small subset of the sample which is called the kernel. A second algorithm predicts future values of the function from the kernel, i.e. the algorithm acts as an hypothesis for the function to be learned. The second… (More)

- Katy S. Azoury, Manfred K. Warmuth
- Machine Learning
- 1999

We consider on-line density estimation with a parameterized density from the exponential family. The on-line algorithm receives one example at a time and maintains a parameter that is essentially an average of the past examples. After receiving an example the algorithm incurs a loss, which is the negative log-likelihood of the example with respect to the… (More)

- Anselm Blumer, Andrzej Ehrenfeucht, David Haussler, Manfred K. Warmuth
- Inf. Process. Lett.
- 1987

- Sally Floyd, Manfred K. Warmuth
- Machine Learning
- 1995

Within the framework of pac-learning, we explore the learnability of concepts from samples using the paradigm of sample compression schemes. A sample compression scheme of sizek for a concept class $$C \subseteq 2^X $$ consists of a compression function and a reconstruction function. The compression function receives a finite sample set consistent with some… (More)

- Olivier Bousquet, Manfred K. Warmuth
- Journal of Machine Learning Research
- 2001

In this paper, we examine on-line learning problems in which the target concept is allowed to change over time. In each trial a master algorithm receives predictions from a large set of n experts. Its goal is to predict almost as well as the best sequence of such experts chosen off-line by partitioning the training sequence into k+1 sections and then… (More)