#### Filter Results:

- Full text PDF available (22)

#### Publication Year

1989

2010

- This year (0)
- Last 5 years (0)
- Last 10 years (5)

#### Publication Type

#### Co-author

#### Journals and Conferences

#### Key Phrases

Learn More

- Jyrki Kivinen, Alexander J. Smola, Robert C. Williamson
- IEEE Transactions on Signal Processing
- 2001

Kernel-based algorithms such as support vector machines have achieved considerable success in various problems in batch setting, where all of the training data is available in advance. Support vector machines combine the so-called kernel trick with the large margin idea. There has been little use of these methods in an online setting suitable for real-time… (More)

- Jyrki Kivinen, Manfred K. Warmuth
- Inf. Comput.
- 1997

We consider two algorithms for on-line prediction based on a linear model. The algorithms are the well-known gradient descent (GD) algorithm and a new algorithm, which we call EG. They both maintain a weight vector using simple updates. For the GD algorithm, the update is based on subtracting the gradient of the squared error made on a prediction. The EG… (More)

- Jyrki Kivinen, Manfred K. Warmuth
- STOC
- 1995

We consider two algorithms for on-line prediction based on a linear model. The algorithms are the well-known Gradient Descent (GD) algorithm and a new algorithm, which we call EG *. They both maintain a weight vector using simple updates. For the GD algorithm, the weight vector is updated by subtracting from it the gradient of the squared error made on a… (More)

- Jyrki Kivinen, Heikki Mannila
- Theor. Comput. Sci.
- 1995

- David Haussler, Jyrki Kivinen, Manfred K. Warmuth
- IEEE Trans. Information Theory
- 1998

We consider adaptive sequential prediction of arbitrary binary sequences when the performance is evaluated using a general loss function. The goal is to predict on each individual sequence nearly as well as the best prediction strategy in a given comparison class of (possibly adaptive) prediction strategies, called experts. By using a general loss function,… (More)

- Jyrki Kivinen, Manfred K. Warmuth
- COLT
- 1999

We consider the AdaBoost procedure for boosting weak learners. In AdaBoost, a key step is choosing a new distribution on the training examples based on the old distribution and the mistakes made by the present weak hypothesis. We show how AdaBoost’s choice of the new distribution can be seen as an approximate solution to the following problem: Find a new… (More)

- Jyrki Kivinen, Manfred K. Warmuth
- EuroCOLT
- 1999

We consider algorithms for combining advice from a set of experts. In each trial, the algorithm receives the predictions of the experts and produces its own prediction. A loss function is applied to measure the discrepancy between the predictions and actual observations. The algorithm keeps a weight for each expert. At each trial the weights are first used… (More)

- Jyrki Kivinen, Heikki Mannila
- ICDT
- 1992

- Wouter M. Koolen, Manfred K. Warmuth, Jyrki Kivinen
- COLT
- 2010

We develop an online algorithm called Component Hedge for learning structured concept classes when the loss of a structured concept sums over its components. Example classes include paths through a graph (composed of edges) and partial permutations (composed of assignments). The algorithm maintains a parameter vector with one non-negative weight per… (More)

- Jyrki Kivinen, Manfred K. Warmuth
- Machine Learning
- 1997

We study on-line generalized linear regression with multidimensional outputs, i.e., neural networks with multiple output nodes but no hidden nodes. We allow at the final layer transfer functions such as the softmax function that need to consider the linear activations to all the output neurons. The weight vectors used to produce the linear activations are… (More)