Yingyu Liang

Learn More
The general perception is that kernel methods are not scalable, so neural nets become the choice for large-scale nonlinear learning problems. Have we tried hard enough for kernel methods? In this paper, we propose an approach that scales up kernel methods using a novel concept called “doubly stochastic functional gradients”. Based on the fact that many(More)
The success of neural network methods for computing word embeddings has motivated methods for generating semantic embeddings of longer pieces of text, such as sentences and paragraphs. Surprisingly, Wieting et al (ICLR’16) showed that such complicated methods are outperformed, especially in out-of-domain (transfer learning) settings, by simpler methods(More)
One of the most widely used techniques for data clustering is agglomerative clustering. Such algorithms have been long used across many different fields ranging from computational biology to social sciences to computer vision in part because their output is easy to interpret. Unfortunately, it is well known, however, that many of the classic agglomerative(More)
Generalization is defined training of generative adversarial network (GAN), and it’s shown that generalization is not guaranteed for the popular distances between distributions such as Jensen-Shannon or Wasserstein. In particular, training may appear to be successful and yet the trained distribution may be arbitrarily far from the target distribution in(More)
Word embeddings are ubiquitous in NLP and information retrieval, but it’s unclear what they represent when the word is polysemous, i.e., has multiple senses. Here it is shown that multiple word senses reside in linear superposition within the word embedding and can be recovered by simple sparse coding. The success of the method —which applies to several(More)
Semantic word embeddings represent the meaning of a word via a vector, and are created by diverse methods. Many use nonlinear operations on co-occurrence statistics, and have hand-tuned hyperparameters and reweighting methods. This paper proposes a new generative model, a dynamic version of the log-linear topic model of Mnih and Hinton (2007). The(More)
We study the distributed computing setting in which there are multiple servers,<lb>each holding a set of points, who wish to compute functions on the union of their<lb>point sets. A key task in this setting is Principal Component Analysis (PCA), in<lb>which the servers would like to compute a low dimensional subspace capturing as<lb>much of the variance of(More)
Generative models for deep learning are promising both to improve understanding of the model, and yield training methods requiring fewer labeled samples. Recent works use generative model approaches to produce the deep net’s input given the value of a hidden layer several levels above. However, there is no accompanying “proof of correctness” for the(More)