# Byzantine-Resilient SGD in High Dimensions on Heterogeneous Data

@article{Data2021ByzantineResilientSI,
title={Byzantine-Resilient SGD in High Dimensions on Heterogeneous Data},
author={Deepesh Data and Suhas N. Diggavi},
journal={2021 IEEE International Symposium on Information Theory (ISIT)},
year={2021},
pages={2310-2315}
}
• Published 16 May 2020
• Computer Science
• 2021 IEEE International Symposium on Information Theory (ISIT)
We study distributed stochastic gradient descent (SGD) in the master-worker architecture under Byzantine attacks. We consider the heterogeneous data model, where different workers may have different local datasets, and we do not make any probabilistic assumptions on data generation. At the core of our algorithm, we use the polynomial-time outlier-filtering procedure for robust mean estimation proposed by Steinhardt et al. (ITCS 2018) to filter-out corrupt gradients. In order to be able to apply…
21 Citations

## Figures from this paper

### Byzantine-Resilient High-Dimensional SGD with Local Iterations on Heterogeneous Data

• Computer Science
ICML
• 2021
This work believes that its is the first Byzantine-resilient algorithm and analysis with local iterations in the presence of malicious/Byzantine clients and derives convergence results under minimal assumptions of bounded variance for SGD and bounded gradient dissimilarity in the statistical heterogeneous data setting.

### On Byzantine-Resilient High-Dimensional Stochastic Gradient Descent

• Computer Science
2020 IEEE International Symposium on Information Theory (ISIT)
• 2020
The authors' algorithm can tolerate less than $\frac{1}{3}$ fraction of Byzantine workers, and can approximately find the optimal parameters in the strongly-convex setting exponentially fast, and reaches to an approximate stationary point in the non-conventus setting with linear speed, thus, matching the convergence rates of vanilla SGD in the Byzantine-free setting.

### Byzantine-Resilient High-Dimensional Federated Learning

• Computer Science
• 2020
This work believes that its is the first Byzantine-resilient algorithm and analysis with local iterations in the presence of malicious/Byzantine clients, and derives convergence results under minimal assumptions of bounded variance for SGD and bounded gradient dissimilarity (which captures heterogeneity among local datasets).

### Byzantine-Robust Learning on Heterogeneous Datasets via Resampling

• Computer Science
ArXiv
• 2020
This work proposes a simple resampling scheme that adapts existing robust algorithms to heterogeneous datasets at a negligible computational cost and theoretically and experimentally validate the approach, showing that combining resamplings with existing robust algorithm is effective against challenging attacks.

### Byzantine-Robust Learning on Heterogeneous Datasets via Bucketing

• Computer Science
• 2020
This work proposes a simple bucketing scheme that adapts existing robust algorithms to heterogeneous datasets at a negligible computational cost, and theoretically and experimentally validate the approach, showing that combining bucketing withexisting robust algorithms is effective against challenging attacks.

### Robust Training in High Dimensions via Block Coordinate Geometric Median Descent

• Computer Science
AISTATS
• 2022
By applying GM to only a judiciously chosen block of coordinates at a time and using a memory mechanism, one can retain the breakdown point of 1/2 for smooth non-convex problems, with non-asymptotic convergence rates comparable to the SGD with GM while resulting in significant speedup in training.

### A Simpli(cid:28)ed Convergence Theory for Byzantine Resilient Stochastic Gradient Descent *

• Computer Science
• 2022
A simpli(cid:28)ed convergence theory for the generic Byzantine Resilient SGD method originally proposed by Blanchard et al. is presented.

### Learning from History for Byzantine Robust Optimization

• Computer Science
ICML
• 2021
This work presents two surprisingly simple strategies: a new robust iterative clipping procedure, and incorporating worker momentum to overcome time-coupled attacks, the first provably robust method for the standard stochastic optimization setting.

### Byzantine-Resilient Decentralized Stochastic Optimization with Robust Aggregation Rules

• Computer Science
ArXiv
• 2022
Following the guidelines, an iterative filtering-based robust aggregation rule termed iterative outlier scissor (IOS) is proposed, which has provable Byzantine-resilience and is shown to be effective in decentralized stochastic optimization.

## References

SHOWING 1-10 OF 41 REFERENCES

### DRACO: Byzantine-resilient Distributed Training via Redundant Gradients

• Computer Science
ICML
• 2018
DRACO is presented, a scalable framework for robust distributed training that uses ideas from coding theory and comes with problem-independent robustness guarantees, and is shown to be several times, to orders of magnitude faster than median-based approaches.

• Computer Science
NIPS
• 2017
Krum is proposed, an aggregation rule that satisfies the resilience property of the aggregation rule capturing the basic requirements to guarantee convergence despite f Byzantine workers, which is argued to be the first provably Byzantine-resilient algorithm for distributed SGD.

### Sparsified SGD with Memory

• Computer Science
NeurIPS
• 2018
This work analyzes Stochastic Gradient Descent with k-sparsification or compression (for instance top-k or random-k) and shows that this scheme converges at the same rate as vanilla SGD when equipped with error compensation.

### signSGD: compressed optimisation for non-convex problems

• Computer Science
ICML
• 2018
SignSGD can get the best of both worlds: compressed gradients and SGD-level convergence rate, and the momentum counterpart of signSGD is able to match the accuracy and convergence speed of Adam on deep Imagenet models.

### Communication-Efficient and Byzantine-Robust Distributed Learning

• Computer Science
2020 Information Theory and Applications Workshop (ITA)
• 2020
It is shown that, in the regime when the compression factor δ is constant and the dimension of the parameter space is fixed, the rate of convergence is not affected by the compression operation, and hence the algorithm effectively gets the compression for free.

### Data Encoding Methods for Byzantine-Resilient Distributed Optimization

• Computer Science
2019 IEEE International Symposium on Information Theory (ISIT)
• 2019
A sparse encoding scheme which enables computationally efficient data encoding and works as efficiently in the streaming data setting as it does in the offline setting, in which all the data is available beforehand.

### Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates

• Computer Science
ICML
• 2018
A main result of this work is a sharp analysis of two robust distributed gradient descent algorithms based on median and trimmed mean operations, respectively, which are shown to achieve order-optimal statistical error rates for strongly convex losses.

### Robust Estimators in High Dimensions without the Computational Intractability

• Computer Science
2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS)
• 2016
This work obtains the first computationally efficient algorithms for agnostically learning several fundamental classes of high-dimensional distributions: a single Gaussian, a product distribution on the hypercube, mixtures of two product distributions (under a natural balancedness condition), and k Gaussians with identical spherical covariances.

### Securing Distributed Gradient Descent in High Dimensional Statistical Learning

• Computer Science
Proc. ACM Meas. Anal. Comput. Syst.
• 2019
A secured variant of the gradient descent method that can tolerate up to a constant fraction of Byzantine workers, and establishes a uniform concentration of the sample covariance matrix of gradients, and shows that the aggregated gradient, as a function of model parameter, converges uniformly to the true gradient function.

### Qsparse-Local-SGD: Distributed SGD With Quantization, Sparsification, and Local Computations

• Computer Science
IEEE Journal on Selected Areas in Information Theory
• 2020
This paper proposes Qsparse-local-SGD algorithm, which combines aggressive sparsification with quantization and local computation along with error compensation, by keeping track of the difference between the true and compressed gradients, and demonstrates that it converges at the same rate as vanilla distributed SGD for many important classes of sparsifiers and quantizers.