Author pages are created from data sourced from our academic publisher partnerships and public sources.
Share This Author
Tighter Theory for Local SGD on Identical and Heterogeneous Data
We provide a new analysis of local SGD, removing unnecessary assumptions and elaborating on the difference between two data regimes: identical and heterogeneous. In both cases, we improve the…
Distributed Learning with Compressed Gradient Differences
- Konstantin Mishchenko, Eduard A. Gorbunov, Martin Takác, Peter Richtárik
- Computer ScienceArXiv
- 26 January 2019
This work proposes a new distributed learning method --- DIANA --- which resolves issues via compression of gradient differences, and performs a theoretical analysis in the strongly convex and nonconvex settings and shows that its rates are superior to existing rates.
Random Reshuffling: Simple Analysis with Vast Improvements
The theory for strongly-convex objectives tightly matches the known lower bounds for both RR and SO and substantiates the common practical heuristic of shuffling once or only a few times and proves fast convergence of the Shuffle-Once algorithm, which shuffles the data only once.
Stochastic Distributed Learning with Gradient Quantization and Variance Reduction
- Samuel Horv'ath, D. Kovalev, Konstantin Mishchenko, Sebastian U. Stich, Peter Richtárik
- Computer Science
- 10 April 2019
These are the first methods that achieve linear convergence for arbitrary quantized updates in distributed optimization where the objective function is spread among different devices, each sending incremental model updates to a central server.
First Analysis of Local GD on Heterogeneous Data
It is shown that in a low accuracy regime, the local gradient descent method has the same communication complexity as gradient descent.
Revisiting Stochastic Extragradient
- Konstantin Mishchenko, D. Kovalev, Egor Shulgin, Peter Richtárik, Yura Malitsky
- Computer ScienceAISTATS
- 27 May 2019
This work fixes a fundamental issue in the stochastic extragradient method by providing a new sampling strategy that is motivated by approximating implicit updates, and proves guarantees for solving variational inequality that go beyond existing settings.
Proximal and Federated Random Reshuffling
- Konstantin Mishchenko, Ahmed Khaled, Peter Richtárik
- Computer Science, MathematicsArXiv
- 12 February 2021
Two new algorithms, Proximal and Federated Random Reshuffing (ProxRR and FedRR), which solve composite convex finitesum minimization problems in which the objective is the sum of a (potentially non-smooth) convex regularizer and an average of n smooth objectives are proposed.
SEGA: Variance Reduction via Gradient Sketching
- Filip Hanzely, Konstantin Mishchenko, Peter Richtárik
- Computer Science, MathematicsNeurIPS
- 9 September 2018
We propose a randomized first order optimization method--SEGA (SkEtched GrAdient method)-- which progressively throughout its iterations builds a variance-reduced estimate of the gradient from random…
A Delay-tolerant Proximal-Gradient Algorithm for Distributed Learning
This work proposes and analyzes a flexible asynchronous optimization algorithm for solving nonsmooth learning problems and proves that the algorithm converges linearly with a fixed learning rate that does not depend on communication delays nor on the number of machines.
MISO is Making a Comeback With Better Proofs and Rates
- Xun Qian, Alibek Sailanbayev, Konstantin Mishchenko, Peter Richtárik
- Computer Science, Mathematics
- 4 June 2019
Numerical experiments show that MISO is a serious competitor to SAGA and SVRG and sometimes outperforms them on real datasets and derives minibatching bounds with arbitrary uniform sampling that lead to linear speedup when the expected minibatch size is in a certain range.