Author pages are created from data sourced from our academic publisher partnerships and public sources.
Share This Author
Federated Learning: Strategies for Improving Communication Efficiency
- Jakub Konecný, H. B. McMahan, Felix X. Yu, Peter Richtárik, A. Suresh, D. Bacon
- Computer ScienceArXiv
- 18 October 2016
Two ways to reduce the uplink communication costs are proposed: structured updates, where the user directly learns an update from a restricted space parametrized using a smaller number of variables, e.g. either low-rank or a random mask; and sketched updates, which learn a full model update and then compress it using a combination of quantization, random rotations, and subsampling.
Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function
A randomized block-coordinate descent method for minimizing the sum of a smooth and a simple nonsmooth block-separable convex function is developed and it is proved that it obtains an accurate solution with probability at least 1-\rho in at most O(n/\varepsilon) iterations, thus achieving first true iteration complexity bounds.
Federated Optimization: Distributed Machine Learning for On-Device Intelligence
We introduce a new and increasingly relevant setting for distributed optimization in machine learning, where the data defining the optimization are unevenly distributed over an extremely large number…
Generalized Power Method for Sparse Principal Component Analysis
- M. Journée, Y. Nesterov, Peter Richtárik, R. Sepulchre
- Computer ScienceJ. Mach. Learn. Res.
- 28 November 2008
A new approach to sparse principal component analysis (sparse PCA) aimed at extracting a single sparse dominant principal component of a data matrix, or more components at once, respectively is developed.
Parallel coordinate descent methods for big data optimization
In this work we show that randomized (block) coordinate descent methods can be accelerated by parallelization when applied to the problem of minimizing the sum of a partially separable smooth convex…
Tighter Theory for Local SGD on Identical and Heterogeneous Data
We provide a new analysis of local SGD, removing unnecessary assumptions and elaborating on the difference between two data regimes: identical and heterogeneous. In both cases, we improve the…
Adding vs. Averaging in Distributed Primal-Dual Optimization
- Chenxin Ma, Virginia Smith, Martin Jaggi, Michael I. Jordan, Peter Richtárik, Martin Takác
- Computer ScienceICML
- 11 February 2015
A novel generalization of the recent communication-efficient primal-dual framework (COCOA) for distributed optimization, which allows for additive combination of local updates to the global parameters at each iteration, whereas previous schemes with convergence guarantees only allow conservative averaging.
Accelerated, Parallel, and Proximal Coordinate Descent
A new randomized coordinate descent method for minimizing the sum of convex functions each of which depends on a small number of coordinates only, which can be implemented without the need to perform full-dimensional vector operations, which is the major bottleneck of accelerated coordinate descent.
Stochastic Primal-Dual Hybrid Gradient Algorithm with Arbitrary Sampling and Imaging Applications
- A. Chambolle, Matthias Joachim Ehrhardt, Peter Richtárik, C. Schönlieb
- Mathematics, Computer ScienceSIAM J. Optim.
- 15 June 2017
We propose a stochastic extension of the primal-dual hybrid gradient algorithm studied by Chambolle and Pock in 2011 to solve saddle point problems that are separable in the dual variable. The anal...
Distributed Coordinate Descent Method for Learning with Big Data
This paper develops and analyzes Hydra: HYbriD cooRdinAte descent method for solving loss minimization problems with big data, and gives bounds on the number of iterations sufficient to approximately solve the problem with high probability.