#### Filter Results:

#### Publication Year

1996

2016

#### Publication Type

#### Co-author

#### Key Phrase

#### Publication Venue

#### Data Set Used

Learn More

Stochastic gradient descent is popular for large scale optimization but has slow convergence asymptotically due to the inherent variance. To remedy this problem , we introduce an explicit variance reduction method for stochastic gradient descent which we call stochastic variance reduced gradient (SVRG). For smooth and strongly convex functions, we prove… (More)

Hidden Markov Models (HMMs) are one of the most fundamental and widely used statistical tools for modeling discrete time series. Typically, they are learned using search heuristics (such as the Baum-Welch / EM algorithm), which suffer from the usual local optima issues. While in general these models are known to be hard to learn with samples from the… (More)

One of the most important issues in machine learning is whether one can improve the performance of a supervised learning algorithm by including unlabeled data. Methods that use both labeled and unlabeled data are generally referred to as semi-supervised learning. Although a number of such methods are proposed, at the current stage, we still don't have a… (More)

We consider the problem of minimizing the sum of two convex functions: one is the average of a large number of smooth component functions, and the other is a general convex function that admits a simple proximal mapping. We assume the whole objective function is strongly convex. Such problems often arise in machine learning, known as regularized empirical… (More)

We consider learning formulations with non-convex objective functions that often occur in practical applications. There are two approaches to this problem: • Heuristic methods such as gradient descent that only find a local minimum. A drawback of this approach is the lack of theoretical guarantee showing that the local minimum gives a good solution. •… (More)

- Kai Yu, Tong Zhang, Yihong Gong
- NIPS
- 2009

This paper introduces a new method for semi-supervised learning on high dimensional nonlinear manifolds, which includes a phase of unsupervised basis learning and a phase of supervised function learning. The learned bases provide a set of anchor points to form a local coordinate system, such that each data point x on the manifold can be locally approximated… (More)

This paper presents a classifier-combination experimental framework for named entity recognition in which four diverse classi-fiers (robust linear classifier, maximum en-tropy, transformation-based learning, and hidden Markov model) are combined under different conditions. When no gazetteer or other additional training resources are used, the combined… (More)

This paper investigates a new learning formulation called <i>structured sparsity</i>, which is a natural extension of the standard sparsity concept in statistical learning and compressive sensing. By allowing arbitrary structures on the feature set, this concept generalizes the group sparsity idea. A general theory is developed for learning with structured… (More)

A number of linear classification methods such as the linear least squares fit (LLSF), logistic regression, and support vector machines (SVM's) have been applied to text categorization problems. These methods share the similarity by finding hyperplanes that approximately separate a class of document vectors from its complement. However, support vector… (More)

We consider multi-label prediction problems with large output spaces under the assumption of output sparsity – that the target (label) vectors have small support. We develop a general theory for a variant of the popular error correcting output code scheme, using ideas from compressed sensing for exploiting this sparsity. The method can be regarded as a… (More)