• Publications
  • Influence
Minimizing finite sums with the stochastic average gradient
TLDR
We analyze the stochastic average gradient (SAG) method for optimizing the sum of a finite number of smooth convex functions, which achieves a faster convergence rate than black-box SG methods. Expand
  • 814
  • 119
  • PDF
A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets
TLDR
We propose a new stochastic gradient method for optimizing the sum of a finite set of smooth functions, where the sum is strongly convex. Expand
  • 593
  • 87
  • PDF
Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering
TLDR
This paper provides a unified framework for extending Local Linear Embedding (LLE), Isomap, Laplacian Eigenmaps, Multi-Dimensional Scaling (for dimensionality reduction) as well as Spectral Clustering. Expand
  • 1,027
  • 66
  • PDF
Convergence Rates of Inexact Proximal-Gradient Methods for Convex Optimization
TLDR
We consider the problem of optimizing the sum of a smooth convex function and a non-smooth convex term using proximal-gradient methods, where an error is present in the calculation of the gradient of the smooth term. Expand
  • 414
  • 54
  • PDF
Representational Power of Restricted Boltzmann Machines and Deep Belief Networks
TLDR
We first prove that adding hidden units yields strictly improved modeling power, while a second theorem shows that RBMs are universal approximators of discrete distributions, a property similar to neural networks with one hidden layer. Expand
  • 592
  • 40
  • PDF
A latent factor model for highly multi-relational data
TLDR
In this paper, we propose a method for modeling large multi-relational datasets, with possibly thousands of relations. Expand
  • 315
  • 36
  • PDF
Learning Eigenfunctions Links Spectral Embedding and Kernel PCA
TLDR
We show a direct relation between spectral embedding methods and kernel principal components analysis and how both are special cases of a more general learning problem: learning the principal eigenfunctions of an operator defined from a kernel and the unknown data-generating density. Expand
  • 292
  • 30
  • PDF
Ask the locals: Multi-way local pooling for image recognition
TLDR
We propose to apply this idea in its simplest form to an object recognition system based on the spatial pyramid framework, to increase the performance of small dictionaries with very little added engineering. Expand
  • 275
  • 26
  • PDF
Topmoumoute Online Natural Gradient Algorithm
TLDR
We develop an efficient, general, online approximation to the natural gradient descent which is suited to large scale problems. Expand
  • 164
  • 17
  • PDF
Label Propagation and Quadratic Criterion
  • 106
  • 12
...
1
2
3
4
5
...