• Publications
  • Influence
Deep Sets
We study the problem of designing models for machine learning tasks defined on sets. In contrast to the traditional approach of operating on fixed dimensional vectors, we consider objective functionsExpand
  • 561
  • 78
  • PDF
Stochastic Variance Reduction for Nonconvex Optimization
We study nonconvex finite-sum problems and analyze stochastic variance reduced gradient (SVRG) methods for them. SVRG and related methods have recently surged into prominence for convex optimizationExpand
  • 341
  • 77
  • PDF
Gradient Descent Provably Optimizes Over-parameterized Neural Networks
One of the mysteries in the success of neural networks is randomly initialized first order methods like gradient descent can achieve zero training loss even though the objective function isExpand
  • 414
  • 66
  • PDF
MMD GAN: Towards Deeper Understanding of Moment Matching Network
Generative moment matching network (GMMN) is a deep generative model that differs from Generative Adversarial Network (GAN) by replacing the discriminator in GAN with a two-sample test based onExpand
  • 320
  • 59
  • PDF
Proximal Stochastic Methods for Nonsmooth Nonconvex Finite-Sum Optimization
We analyze stochastic algorithms for optimizing nonconvex, nonsmooth finite-sum problems, where the nonsmooth part is convex. Surprisingly, unlike the smooth case, our knowledge of this fundamentalExpand
  • 105
  • 26
  • PDF
High Dimensional Bayesian Optimisation and Bandits via Additive Models
Bayesian Optimisation (BO) is a technique used in optimising a $D$-dimensional function which is typically expensive to evaluate. While there have been many successes for BO in low dimensions,Expand
  • 143
  • 20
  • PDF
Gradient Descent Learns One-hidden-layer CNN: Don't be Afraid of Spurious Local Minima
We consider the problem of learning a one-hidden-layer neural network with non-overlapping convolutional layer and ReLU activation function, i.e., $f(\mathbf{Z}; \mathbf{w}, \mathbf{a}) = \sum_jExpand
  • 159
  • 18
  • PDF
Neural Architecture Search with Bayesian Optimisation and Optimal Transport
Bayesian Optimisation (BO) refers to a class of methods for global optimisation of a function $f$ which is only accessible via point evaluations. It is typically used in settings where $f$ isExpand
  • 180
  • 17
  • PDF
On Variance Reduction in Stochastic Gradient Descent and its Asynchronous Variants
We study optimization algorithms based on variance reduction for stochastic gradient descent (SGD). Remarkable recent progress has been made in this direction through development of algorithms likeExpand
  • 154
  • 17
  • PDF
Variance Reduction in Stochastic Gradient Langevin Dynamics
Stochastic gradient-based Monte Carlo methods such as stochastic gradient Langevin dynamics are useful tools for posterior inference on large scale datasets in many machine learning applications.Expand
  • 53
  • 17
  • PDF