• Publications
  • Influence
Deep Sets
TLDR
The main theorem characterizes the permutation invariant objective functions and provides a family of functions to which any permutation covariant objective function must belong, which enables the design of a deep network architecture that can operate on sets and which can be deployed on a variety of scenarios including both unsupervised and supervised learning tasks. Expand
Gradient Descent Provably Optimizes Over-parameterized Neural Networks
TLDR
Over-parameterization and random initialization jointly restrict every weight vector to be close to its initialization for all iterations, which allows a strong convexity-like property to show that gradient descent converges at a global linear rate to the global optimum. Expand
Stochastic Variance Reduction for Nonconvex Optimization
TLDR
This work proves non-asymptotic rates of convergence of SVRG for nonconvex optimization, and shows that it is provably faster than SGD and gradient descent. Expand
MMD GAN: Towards Deeper Understanding of Moment Matching Network
TLDR
In the evaluation on multiple benchmark datasets, including MNIST, CIFAR- 10, CelebA and LSUN, the performance of MMD-GAN significantly outperforms GMMN, and is competitive with other representative GAN works. Expand
Proximal Stochastic Methods for Nonsmooth Nonconvex Finite-Sum Optimization
TLDR
This work develops fast stochastic algorithms that provably converge to a stationary point for constant minibatches and proves global linear convergence rate for an interesting subclass of nonsmooth nonconvex functions, which subsumes several recent works. Expand
Neural Architecture Search with Bayesian Optimisation and Optimal Transport
TLDR
NASHBOT is developed, a Gaussian process based BO framework for neural architecture search which outperforms other alternatives for architecture search in several cross validation based model selection tasks on multi-layer perceptrons and convolutional neural networks. Expand
High Dimensional Bayesian Optimisation and Bandits via Additive Models
TLDR
It is demonstrated that the method outperforms naive BO on additive functions and on several examples where the function is not additive, and it is proved that, for additive functions the regret has only linear dependence on $D$ even though the function depends on all$D$ dimensions. Expand
One Network to Solve Them All — Solving Linear Inverse Problems Using Deep Projection Models
TLDR
This work proposes a general framework to train a single deep neural network that solves arbitrary linear inverse problems and demonstrates superior performance over traditional methods using wavelet sparsity prior while achieving performance comparable to specially-trained networks on tasks including compressive sensing and pixel-wise inpainting. Expand
Competence-based Curriculum Learning for Neural Machine Translation
TLDR
A curriculum learning framework for NMT that reduces training time, reduces the need for specialized heuristics or large batch sizes, and results in overall better performance, which can help improve the training time and the performance of both recurrent neural network models and Transformers. Expand
Variance Reduction in Stochastic Gradient Langevin Dynamics
TLDR
Techniques for reducing variance in stochastic gradient Langevin dynamics are presented, yielding novel stoChastic Monte Carlo methods that improve performance by reducing the variance in the stochastics gradient. Expand
...
1
2
3
4
5
...