Share This Author
- M. Zaheer, Satwik Kottur, Siamak Ravanbakhsh, B. Póczos, R. Salakhutdinov, Alex Smola
- Computer Science, MathematicsNIPS
- 10 March 2017
The main theorem characterizes the permutation invariant objective functions and provides a family of functions to which any permutation covariant objective function must belong, which enables the design of a deep network architecture that can operate on sets and which can be deployed on a variety of scenarios including both unsupervised and supervised learning tasks.
Gradient Descent Provably Optimizes Over-parameterized Neural Networks
Over-parameterization and random initialization jointly restrict every weight vector to be close to its initialization for all iterations, which allows a strong convexity-like property to show that gradient descent converges at a global linear rate to the global optimum.
Stochastic Variance Reduction for Nonconvex Optimization
This work proves non-asymptotic rates of convergence of SVRG for nonconvex optimization, and shows that it is provably faster than SGD and gradient descent.
MMD GAN: Towards Deeper Understanding of Moment Matching Network
In the evaluation on multiple benchmark datasets, including MNIST, CIFAR- 10, CelebA and LSUN, the performance of MMD-GAN significantly outperforms GMMN, and is competitive with other representative GAN works.
Proximal Stochastic Methods for Nonsmooth Nonconvex Finite-Sum Optimization
This work develops fast stochastic algorithms that provably converge to a stationary point for constant minibatches and proves global linear convergence rate for an interesting subclass of nonsmooth nonconvex functions, which subsumes several recent works.
Competence-based Curriculum Learning for Neural Machine Translation
- Emmanouil Antonios Platanios, Otilia Stretcu, Graham Neubig, B. Póczos, Tom Michael Mitchell
- Computer ScienceNAACL
- 23 March 2019
A curriculum learning framework for NMT that reduces training time, reduces the need for specialized heuristics or large batch sizes, and results in overall better performance, which can help improve the training time and the performance of both recurrent neural network models and Transformers.
Neural Architecture Search with Bayesian Optimisation and Optimal Transport
- Kirthevasan Kandasamy, W. Neiswanger, J. Schneider, B. Póczos, E. Xing
- Computer ScienceNeurIPS
- 11 February 2018
NASHBOT is developed, a Gaussian process based BO framework for neural architecture search which outperforms other alternatives for architecture search in several cross validation based model selection tasks on multi-layer perceptrons and convolutional neural networks.
High Dimensional Bayesian Optimisation and Bandits via Additive Models
It is demonstrated that the method outperforms naive BO on additive functions and on several examples where the function is not additive, and it is proved that, for additive functions the regret has only linear dependence on $D$ even though the function depends on all$D$ dimensions.
Graph Neural Tangent Kernel: Fusing Graph Neural Networks with Graph Kernels
- S. Du, Kangcheng Hou, B. Póczos, R. Salakhutdinov, Ruosong Wang, Keyulu Xu
- Computer ScienceNeurIPS
- 30 May 2019
A new class of graph kernels, Graph Neural Tangent Kernels (GNTKs), which correspond to infinitely wide multi-layer GNNs trained by gradient descent are presented, which enjoy the full expressive power ofGNNs and inherit advantages of GKs.
One Network to Solve Them All — Solving Linear Inverse Problems Using Deep Projection Models
- J. H. Rick Chang, Chun-Liang Li, B. Póczos, B. Kumar
- Computer Science, MathematicsIEEE International Conference on Computer Vision…
- 29 March 2017
This work proposes a general framework to train a single deep neural network that solves arbitrary linear inverse problems and demonstrates superior performance over traditional methods using wavelet sparsity prior while achieving performance comparable to specially-trained networks on tasks including compressive sensing and pixel-wise inpainting.