#### Filter Results:

#### Publication Year

2001

2016

#### Publication Type

#### Co-author

#### Key Phrase

#### Publication Venue

Learn More

- Koby Crammer, Ofer Dekel, Joseph Keshet, Shai Shalev-Shwartz, Il
- 2005

We present a family of online learning, margin based, algorithms for various prediction tasks. In particular we derive and analyze algorithms for binary and multiclass categorization, regression, uniclass prediction and sequence prediction. All of the algorithms we present can utilize kernel functions. The update steps of our different algorithms are all… (More)

A common problem of kernel-based online algorithms, such as the kernel-based Perceptron algorithm , is the amount of memory required to store the online hypothesis, which may increase without bound as the algorithm progresses. Furthermore, the computational load of such algorithms grows linearly with the amount of memory used to store the hypothesis. To… (More)

In discriminative machine learning one is interested in training a system to optimize a certain desired measure of performance, or loss. In binary classification one typically tries to minimizes the error rate. But in structured prediction each task often has its own measure of performance such as the BLEU score in machine translation or the… (More)

We present a discriminative online algorithm with a bounded memory growth, which is based on the kernel-based Perceptron. Generally, the required memory of the kernel-based Perceptron for storing the online hypothesis is not bounded. Previous work has been focused on discarding part of the instances in order to keep the memory bounded. In the proposed… (More)

We present an algorithmic framework for supervised classification learning where the set of labels is organized in a predefined hierarchical structure. This structure is encoded by a rooted tree which induces a metric over the label set. Our approach combines ideas from large margin kernel methods and Bayesian analysis. Following the large margin principle,… (More)

We consider the problem of binary classification where the classifier may abstain instead of classifying each observation. The Bayes decision rule for this setup, known as Chow's rule, is defined by two thresholds on posterior probabilities. From simple desiderata, namely the consistency and the sparsity of the classifier, we derive the double hinge loss… (More)

We describe a new approach for phoneme recognition which aims at minimizing the phoneme error rate. Building on structured prediction techniques, we formulate the phoneme recognizer as a linear combination of feature functions. We state a PAC-Bayesian generalization bound, which gives an upper-bound on the expected phoneme error rate in terms of the… (More)

We describe a new method for phoneme sequence recognition given a speech utterance, which is not based on the HMM. In contrast to HMM-based approaches, our method uses a discriminative kernel-based training procedure in which the learning process is tailored to the goal of minimizing the Levenshtein distance between the predicted phoneme sequence and the… (More)

We present a method for efficiently training binary and multiclass kernelized SVMs on a Graphics Processing Unit (GPU). Our methods apply to a broad range of kernels, including the popular Gaus- sian kernel, on datasets as large as the amount of available memory on the graphics card. Our approach is distinguished from earlier work in that it cleanly and… (More)

The focus of the paper is the problem of learning kernel operators from empirical data. We cast the kernel design problem as the construction of an accurate kernel from simple (and less accurate) base kernels. We use the boosting paradigm to perform the kernel construction process. To do so, we modify the booster so as to accommodate kernel operators. We… (More)