#### Filter Results:

#### Publication Year

1996

2016

#### Publication Type

#### Co-author

#### Publication Venue

#### Data Set Used

#### Key Phrases

Learn More

- Rie Johnson, Tong Zhang
- NIPS
- 2013

Stochastic gradient descent is popular for large scale optimization but has slow convergence asymptotically due to the inherent variance. To remedy this problem , we introduce an explicit variance reduction method for stochastic gradient descent which we call stochastic variance reduced gradient (SVRG). For smooth and strongly convex functions, we prove… (More)

- TONG ZHANG
- 2004

We study how closely the optimal Bayes error rate can be approximately reached using a classification algorithm that computes a classifier by minimizing a convex upper bound of the classification error function. The measurement of closeness is characterized by the loss function used in the estimation. We show that such a classification scheme can be… (More)

- Daniel J. Hsu, Sham M. Kakade, Tong Zhang
- COLT
- 2009

Hidden Markov Models (HMMs) are one of the most fundamental and widely used statistical tools for modeling discrete time series. Typically, they are learned using search heuristics (such as the Baum-Welch / EM algorithm), which suffer from the usual local optima issues. While in general these models are known to be hard to learn with samples from the… (More)

- Rie Kubota Ando, Tong Zhang
- Journal of Machine Learning Research
- 2005

One of the most important issues in machine learning is whether one can improve the performance of a supervised learning algorithm by including unlabeled data. Methods that use both labeled and unlabeled data are generally referred to as semi-supervised learning. Although a number of such methods are proposed, at the current stage, we still don't have a… (More)

Stochastic Gradient Descent (SGD) has become popular for solving large scale supervised machine learning optimization problems such as SVM, due to their strong theoretical guarantees. While the closely related Dual Coordinate Ascent (DCA) method has been implemented in various software packages, it has so far lacked good convergence analysis. This paper… (More)

- Tong Zhang
- Journal of Machine Learning Research
- 2010

We consider learning formulations with non-convex objective functions that often occur in practical applications. There are two approaches to this problem: • Heuristic methods such as gradient descent that only find a local minimum. A drawback of this approach is the lack of theoretical guarantee showing that the local minimum gives a good solution. •… (More)

- Lin Xiao, Tong Zhang
- SIAM Journal on Optimization
- 2014

We consider the problem of minimizing the sum of two convex functions: one is the average of a large number of smooth component functions, and the other is a general convex function that admits a simple proximal mapping. We assume the whole objective function is strongly convex. Such problems often arise in machine learning, known as regularized empirical… (More)

- Kai Yu, Tong Zhang, Yihong Gong
- NIPS
- 2009

This paper introduces a new method for semi-supervised learning on high dimensional nonlinear manifolds, which includes a phase of unsupervised basis learning and a phase of supervised function learning. The learned bases provide a set of anchor points to form a local coordinate system, such that each data point x on the manifold can be locally approximated… (More)

- Tong Zhang, Frank J. Oles
- Inf. Retr.
- 2001

A number of linear classification methods such as the linear least squares fit (LLSF), logistic regression, and support vector machines (SVM's) have been applied to text categorization problems. These methods share the similarity by finding hyperplanes that approximately separate a class of document vectors from its complement. However, support vector… (More)

- Radu Florian, Abraham Ittycheriah, Hongyan Jing, Tong Zhang
- CoNLL
- 2003

This paper presents a classifier-combination experimental framework for named entity recognition in which four diverse classi-fiers (robust linear classifier, maximum en-tropy, transformation-based learning, and hidden Markov model) are combined under different conditions. When no gazetteer or other additional training resources are used, the combined… (More)