• Publications
  • Influence
Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization
TLDR
In this paper, we investigate the optimality of SGD in a stochastic setting. Expand
Optimal Distributed Online Prediction Using Mini-Batches
TLDR
We present the distributed mini-batch algorithm, a method of converting many serial gradient-based online prediction algorithms into distributed algorithms, achieving asymptotically linear speed-up over multiple processors. Expand
Communication-Efficient Distributed Optimization using an Approximate Newton-type Method
TLDR
We present a novel Newton-type method for distributed optimization, which is particularly well suited for stochastic optimization and learning problems. Expand
Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes
TLDR
Stochastic Gradient Descent is one of the simplest and most popular stochastic optimization methods. Expand
Multi-Player Bandits - a Musical Chairs Approach
TLDR
We consider a variant of the stochastic multi-armed bandit problem, where multiple players simultaneously choose from the same set of arms and may collide, receiving no reward. Expand
Size-Independent Sample Complexity of Neural Networks
TLDR
We study the sample complexity of learning neural networks, by providing new bounds on their Rademacher complexity assuming norm constraints on the parameter matrix of each layer. Expand
Adaptively Learning the Crowd Kernel
TLDR
We introduce a new scale-invariant Kernel approximation model that, given n objects, learns a similarity matrix over all n2 pairs, from crowdsourced data alone. Expand
Stochastic Convex Optimization
TLDR
We study stochastic convex optimization, and uncover a surprisingly different situation in the more general setting: although the problem is learnable (e.g. using online-to-batch conversions), no uniform convergence holds. Expand
Learnability, Stability and Uniform Convergence
TLDR
The problem of characterizing learnability is the most basic question of statistical learning theory. Expand
The Power of Depth for Feedforward Neural Networks
TLDR
We show that there is a simple (approximately radial) function on $\reals^d$, expressible by a small 3-layer feedforward neural networks, which cannot be approximated by any 2-layer network, to more than a certain constant accuracy, unless its width is exponential in the dimension. Expand
...
1
2
3
4
5
...