• Publications
  • Influence
Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization
TLDR
This paper investigates the optimality of SGD in a stochastic setting, and shows that for smooth problems, the algorithm attains the optimal O(1/T) rate, however, for non-smooth problems the convergence rate with averaging might really be Ω(log(T)/T), and this is not just an artifact of the analysis.
Communication-Efficient Distributed Optimization using an Approximate Newton-type Method
TLDR
A novel Newton-type method for distributed optimization, which is particularly well suited for stochastic optimization and learning problems, and which enjoys a linear rate of convergence which provably improves with the data size.
Optimal Distributed Online Prediction Using Mini-Batches
TLDR
This work presents the distributed mini-batch algorithm, a method of converting many serial gradient-based online prediction algorithms into distributed algorithms that is asymptotically optimal for smooth convex loss functions and stochastic inputs and proves a regret bound for this method.
Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes
TLDR
The performance of SGD without non-trivial smoothness assumptions is investigated, as well as a running average scheme to convert the SGD iterates to a solution with optimal optimization accuracy, and a new and simple averaging scheme is proposed which not only attains optimal rates, but can also be easily computed on-the-fly.
Learnability, Stability and Uniform Convergence
TLDR
This paper considers the General Learning Setting (introduced by Vapnik), which includes most statistical learning problems as special cases, and identifies stability as the key necessary and sufficient condition for learnability.
Size-Independent Sample Complexity of Neural Networks
TLDR
The sample complexity of learning neural networks is studied by providing new bounds on their Rademacher complexity, assuming norm constraints on the parameter matrix of each layer, and under some additional assumptions, these bounds are fully independent of the network size.
Stochastic Convex Optimization
TLDR
Stochastic convex optimization is studied, and it is shown that the key ingredient is strong convexity and regularization, which is only a sufficient, but not necessary, condition for meaningful non-trivial learnability.
Multi-Player Bandits - a Musical Chairs Approach
TLDR
This work provides a communication-free algorithm (Musical Chairs) which attains constant regret with high probability, as well as a sublinear-regret, communication- free algorithm (Dynamic Musical Ch chairs) for the more difficult setting of players dynamically entering and leaving throughout the game.
Adaptively Learning the Crowd Kernel
TLDR
An algorithm that, given n objects, learns a similarity matrix over all n2 pairs, from crowdsourced data alone is introduced, and SVMs reveal that the crowd kernel captures prominent and subtle features across a number of domains.
The Power of Depth for Feedforward Neural Networks
TLDR
It is shown that there is a simple (approximately radial) function on $\reals^d$, expressible by a small 3-layer feedforward neural networks, which cannot be approximated by any 2-layer network, unless its width is exponential in the dimension.
...
...