Learn More
The general perception is that kernel methods are not scalable, so neural nets become the choice for large-scale nonlinear learning problems. Have we tried hard enough for kernel methods? In this paper, we propose an approach that scales up kernel methods using a novel concept called " doubly stochastic functional gradients ". Based on the fact that many(More)
We study the distributed computing setting in which there are multiple servers, each holding a set of points, who wish to compute functions on the union of their point sets. A key task in this setting is Principal Component Analysis (PCA), in which the servers would like to compute a low dimensional subspace capturing as much of the variance of the union of(More)
Can we learn the influence of a set of people in a social network from cascades of information diffusion? This question is often addressed by a two-stage approach: first learn a diffusion model, and then calculate the influence based on the learned model. Thus, the success of this approach relies heavily on the correctness of the diffusion model which is(More)
Nonlinear component analysis such as kernel Principle Component Analysis (KPCA) and kernel Canonical Correlation Analysis (KCCA) are widely used in machine learning, statistics and data analysis, and they serve as invaluable preprocessing tools for various purposes such as data exploration, dimension reduction and feature extraction. However, existing(More)
This paper proposes a distributed PCA algorithm, with the theoretical guarantee that any good approximation solution on the projected data for k-means clustering is also a good approximation on the original data, while the projected dimension required is independent of the original dimension. When combined with the distributed coreset-based clustering(More)
Word embeddings are ubiquitous in NLP and information retrieval, but it's unclear what they represent when the word is polysemous, i.e., has multiple senses. Here it is shown that multiple word senses reside in linear superposition within the word embedding and can be recovered by simple sparse coding. The success of the method —which applies to several(More)
One of the most widely used techniques for data clustering is agglomerative clustering. Such algorithms have been long used across many different fields ranging from computational biology to social sciences to computer vision in part because their output is easy to interpret. Unfortunately, it is well known, however, that many of the classic agglomerative(More)
The papers of Mikolov et al. 2013 as well as subsequent works have led to dramatic progress in solving word analogy tasks using semantic word embeddings. This leverages linear structure that is often found in the word embeddings, which is surprising since the training method is usually nonlinear. There were attempts —notably by Levy and Goldberg and(More)