# Byzantine-Resilient SGD in High Dimensions on Heterogeneous Data

@article{Data2021ByzantineResilientSI, title={Byzantine-Resilient SGD in High Dimensions on Heterogeneous Data}, author={Deepesh Data and Suhas N. Diggavi}, journal={2021 IEEE International Symposium on Information Theory (ISIT)}, year={2021}, pages={2310-2315} }

We study distributed stochastic gradient descent (SGD) in the master-worker architecture under Byzantine attacks. We consider the heterogeneous data model, where different workers may have different local datasets, and we do not make any probabilistic assumptions on data generation. At the core of our algorithm, we use the polynomial-time outlier-filtering procedure for robust mean estimation proposed by Steinhardt et al. (ITCS 2018) to filter-out corrupt gradients. In order to be able to apply…

## 21 Citations

### Byzantine-Resilient High-Dimensional SGD with Local Iterations on Heterogeneous Data

- Computer ScienceICML
- 2021

This work believes that its is the first Byzantine-resilient algorithm and analysis with local iterations in the presence of malicious/Byzantine clients and derives convergence results under minimal assumptions of bounded variance for SGD and bounded gradient dissimilarity in the statistical heterogeneous data setting.

### On Byzantine-Resilient High-Dimensional Stochastic Gradient Descent

- Computer Science2020 IEEE International Symposium on Information Theory (ISIT)
- 2020

The authors' algorithm can tolerate less than $\frac{1}{3}$ fraction of Byzantine workers, and can approximately find the optimal parameters in the strongly-convex setting exponentially fast, and reaches to an approximate stationary point in the non-conventus setting with linear speed, thus, matching the convergence rates of vanilla SGD in the Byzantine-free setting.

### Byzantine-Resilient High-Dimensional Federated Learning

- Computer Science
- 2020

This work believes that its is the first Byzantine-resilient algorithm and analysis with local iterations in the presence of malicious/Byzantine clients, and derives convergence results under minimal assumptions of bounded variance for SGD and bounded gradient dissimilarity (which captures heterogeneity among local datasets).

### Byzantine-Robust Learning on Heterogeneous Datasets via Resampling

- Computer ScienceArXiv
- 2020

This work proposes a simple resampling scheme that adapts existing robust algorithms to heterogeneous datasets at a negligible computational cost and theoretically and experimentally validate the approach, showing that combining resamplings with existing robust algorithm is effective against challenging attacks.

### Byzantine-Robust Learning on Heterogeneous Datasets via Bucketing

- Computer Science
- 2020

This work proposes a simple bucketing scheme that adapts existing robust algorithms to heterogeneous datasets at a negligible computational cost, and theoretically and experimentally validate the approach, showing that combining bucketing withexisting robust algorithms is effective against challenging attacks.

### A simplified convergence theory for Byzantine resilient stochastic gradient descent

- Computer ScienceEURO Journal on Computational Optimization
- 2022

### Robust Training in High Dimensions via Block Coordinate Geometric Median Descent

- Computer ScienceAISTATS
- 2022

By applying GM to only a judiciously chosen block of coordinates at a time and using a memory mechanism, one can retain the breakdown point of 1/2 for smooth non-convex problems, with non-asymptotic convergence rates comparable to the SGD with GM while resulting in significant speedup in training.

### A Simpli(cid:28)ed Convergence Theory for Byzantine Resilient Stochastic Gradient Descent *

- Computer Science
- 2022

A simpli(cid:28)ed convergence theory for the generic Byzantine Resilient SGD method originally proposed by Blanchard et al. is presented.

### Learning from History for Byzantine Robust Optimization

- Computer ScienceICML
- 2021

This work presents two surprisingly simple strategies: a new robust iterative clipping procedure, and incorporating worker momentum to overcome time-coupled attacks, the first provably robust method for the standard stochastic optimization setting.

### Byzantine-Resilient Decentralized Stochastic Optimization with Robust Aggregation Rules

- Computer ScienceArXiv
- 2022

Following the guidelines, an iterative filtering-based robust aggregation rule termed iterative outlier scissor (IOS) is proposed, which has provable Byzantine-resilience and is shown to be effective in decentralized stochastic optimization.

## References

SHOWING 1-10 OF 41 REFERENCES

### DRACO: Byzantine-resilient Distributed Training via Redundant Gradients

- Computer ScienceICML
- 2018

DRACO is presented, a scalable framework for robust distributed training that uses ideas from coding theory and comes with problem-independent robustness guarantees, and is shown to be several times, to orders of magnitude faster than median-based approaches.

### Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent

- Computer ScienceNIPS
- 2017

Krum is proposed, an aggregation rule that satisfies the resilience property of the aggregation rule capturing the basic requirements to guarantee convergence despite f Byzantine workers, which is argued to be the first provably Byzantine-resilient algorithm for distributed SGD.

### Sparsified SGD with Memory

- Computer ScienceNeurIPS
- 2018

This work analyzes Stochastic Gradient Descent with k-sparsification or compression (for instance top-k or random-k) and shows that this scheme converges at the same rate as vanilla SGD when equipped with error compensation.

### signSGD: compressed optimisation for non-convex problems

- Computer ScienceICML
- 2018

SignSGD can get the best of both worlds: compressed gradients and SGD-level convergence rate, and the momentum counterpart of signSGD is able to match the accuracy and convergence speed of Adam on deep Imagenet models.

### Communication-Efficient and Byzantine-Robust Distributed Learning

- Computer Science2020 Information Theory and Applications Workshop (ITA)
- 2020

It is shown that, in the regime when the compression factor δ is constant and the dimension of the parameter space is fixed, the rate of convergence is not affected by the compression operation, and hence the algorithm effectively gets the compression for free.

### Data Encoding Methods for Byzantine-Resilient Distributed Optimization

- Computer Science2019 IEEE International Symposium on Information Theory (ISIT)
- 2019

A sparse encoding scheme which enables computationally efficient data encoding and works as efficiently in the streaming data setting as it does in the offline setting, in which all the data is available beforehand.

### Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates

- Computer ScienceICML
- 2018

A main result of this work is a sharp analysis of two robust distributed gradient descent algorithms based on median and trimmed mean operations, respectively, which are shown to achieve order-optimal statistical error rates for strongly convex losses.

### Robust Estimators in High Dimensions without the Computational Intractability

- Computer Science2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS)
- 2016

This work obtains the first computationally efficient algorithms for agnostically learning several fundamental classes of high-dimensional distributions: a single Gaussian, a product distribution on the hypercube, mixtures of two product distributions (under a natural balancedness condition), and k Gaussians with identical spherical covariances.

### Securing Distributed Gradient Descent in High Dimensional Statistical Learning

- Computer ScienceProc. ACM Meas. Anal. Comput. Syst.
- 2019

A secured variant of the gradient descent method that can tolerate up to a constant fraction of Byzantine workers, and establishes a uniform concentration of the sample covariance matrix of gradients, and shows that the aggregated gradient, as a function of model parameter, converges uniformly to the true gradient function.

### Qsparse-Local-SGD: Distributed SGD With Quantization, Sparsification, and Local Computations

- Computer ScienceIEEE Journal on Selected Areas in Information Theory
- 2020

This paper proposes Qsparse-local-SGD algorithm, which combines aggressive sparsification with quantization and local computation along with error compensation, by keeping track of the difference between the true and compressed gradients, and demonstrates that it converges at the same rate as vanilla distributed SGD for many important classes of sparsifiers and quantizers.