#### Filter Results:

#### Publication Year

2008

2015

#### Publication Type

#### Co-author

#### Key Phrase

#### Publication Venue

#### Data Set Used

Learn More

Recently, there has been substantial interest in using large amounts of unlabeled data to learn word representations which can then be used as features in supervised classifiers for NLP tasks. However, most current approaches are slow to train, do not model the context of the word, and lack theoretical grounding. In this paper, we present a new learning… (More)

Spectral learning algorithms have recently become popular in data-rich domains, driven in part by recent advances in large scale randomized SVD, and in spectral estimation of Hidden Markov Models. Extensions of these methods lead to statistical estimation algorithms which are not only fast, scalable, and useful on real data sets, but are also provably… (More)

Unlabeled data is often used to learn representations which can be used to supplement baseline features in a supervised learner. For example, for text applications where the words lie in a very high dimensional space (the size of the vocabulary), one can learn a low rank " dictionary " by an eigen-decomposition of the word co-occurrence matrix (e.g. using… (More)

We propose a framework MIC (Multiple Inclusion Criterion) for learning sparse models based on the information theoretic Minimum Description Length (MDL) principle. MIC provides an elegant way of incorporating arbitrary sparsity patterns in the feature space by using two-part MDL coding schemes. We present MIC based models for the problems of grouped feature… (More)

Recently there has been substantial interest in using spectral methods to learn generative sequence models like HMMs. Spectral methods are attractive as they provide globally consistent estimates of the model parameters and are very fast and scalable, unlike EM methods , which can get stuck in local minima. In this paper, we present a novel extension of… (More)

We propose a fast algorithm for ridge regression when the number of features is much larger than the number of observations (p n). The standard way to solve ridge regression in this setting works in the dual space and gives a running time of O(n 2 p). Our algorithm Subsampled Randomized Hadamard Transform-Dual Ridge Regression (SRHT-DRR) runs in time O(np… (More)

We compare the risk of ridge regression to a simple variant of ordinary least squares, in which one simply projects the data onto a finite dimensional subspace (as specified by a principal component analysis) and then performs an ordinary (un-regularized) least squares regression in this subspace. This note shows that the risk of this ordinary least squares… (More)

In this paper we propose a new approach for semi-supervised structured output learning. Our approach uses relaxed labeling on un-labeled data to deal with the combinatorial nature of the label space and further uses domain constraints to guide the learning. Since the overall objective is non-convex, we alternate between the optimization of the model… (More)

We address the problem of fast estimation of ordinary least squares (OLS) from large amounts of data (n p). We propose three methods which solve the big data problem by subsampling the covariance matrix using either a single or two stage estimation. All three run in the order of size of input i.e. O(np) and our best method, Uluru, gives an error bound of… (More)

Graph-based semi-supervised learning (SSL) methods usually consist of two stages: in the first stage, a graph is constructed from the set of input instances; and in the second stage, the available label information along with the constructed graph is used to assign labels to the unlabeled instances. Most of the previously proposed graph construction methods… (More)