• Publications
  • Influence
Generalization and equilibrium in generative adversarial nets (GANs) (invited talk)
Generative Adversarial Networks (GANs) have become one of the dominant methods for fitting generative models to complicated real-life data, and even found unusual uses such as designing good
Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers
TLDR
It is proved that overparameterized neural networks can learn some notable concept classes, including two and three-layer networks with fewer parameters and smooth activations, and SGD (stochastic gradient descent) or its variants in polynomial time using polynomially many samples.
A Latent Variable Model Approach to PMI-based Word Embeddings
TLDR
A new generative model is proposed, a dynamic version of the log-linear topic model of Mnih and Hinton (2007) to use the prior to compute closed form expressions for word statistics, and it is shown that latent word vectors are fairly uniformly dispersed in space.
Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data
TLDR
It is proved that SGD learns a network with a small generalization error, albeit the network has enough capacity to fit arbitrary labels, when the data comes from mixtures of well-separated distributions.
Scalable Kernel Methods via Doubly Stochastic Gradients
TLDR
An approach that scales up kernel methods using a novel concept called "doubly stochastic functional gradients" based on the fact that many kernel methods can be expressed as convex optimization problems, which can readily scale kernel methods up to the regimes which are dominated by neural nets.
Clustering under Perturbation Resilience
TLDR
This paper presents an algorithm that can optimally cluster instances resilient to $(1 + \sqrt{2})$-factor perturbations, solving an open problem of Awasthi et al.
Linear Algebraic Structure of Word Senses, with Applications to Polysemy
TLDR
It is shown that multiple word senses reside in linear superposition within the word embedding and simple sparse coding can recover vectors that approximately capture the senses.
Diverse Neural Network Learns True Target Functions
TLDR
This paper analyzes one-hidden-layer neural networks with ReLU activation, and shows that despite the non-convexity, Neural networks with diverse units have no spurious local minima and suggests a novel regularization function to promote unit diversity for potentially better generalization.
A La Carte Embedding: Cheap but Effective Induction of Semantic Feature Vectors
TLDR
A la carte embedding is introduced, a simple and general alternative to the usual word2vec-based approaches for building such representations that is based upon recent theoretical results for GloVe-like embeddings.
...
1
2
3
4
5
...