Byzantine-Resilient SGD in High Dimensions on Heterogeneous Data

@article{Data2021ByzantineResilientSI,
  title={Byzantine-Resilient SGD in High Dimensions on Heterogeneous Data},
  author={Deepesh Data and Suhas N. Diggavi},
  journal={2021 IEEE International Symposium on Information Theory (ISIT)},
  year={2021},
  pages={2310-2315}
}
  • Deepesh Data, S. Diggavi
  • Published 16 May 2020
  • Computer Science
  • 2021 IEEE International Symposium on Information Theory (ISIT)
We study distributed stochastic gradient descent (SGD) in the master-worker architecture under Byzantine attacks. We consider the heterogeneous data model, where different workers may have different local datasets, and we do not make any probabilistic assumptions on data generation. At the core of our algorithm, we use the polynomial-time outlier-filtering procedure for robust mean estimation proposed by Steinhardt et al. (ITCS 2018) to filter-out corrupt gradients. In order to be able to apply… 

Figures from this paper

Byzantine-Resilient High-Dimensional SGD with Local Iterations on Heterogeneous Data

TLDR
This work believes that its is the first Byzantine-resilient algorithm and analysis with local iterations in the presence of malicious/Byzantine clients and derives convergence results under minimal assumptions of bounded variance for SGD and bounded gradient dissimilarity in the statistical heterogeneous data setting.

On Byzantine-Resilient High-Dimensional Stochastic Gradient Descent

TLDR
The authors' algorithm can tolerate less than $\frac{1}{3}$ fraction of Byzantine workers, and can approximately find the optimal parameters in the strongly-convex setting exponentially fast, and reaches to an approximate stationary point in the non-conventus setting with linear speed, thus, matching the convergence rates of vanilla SGD in the Byzantine-free setting.

Byzantine-Resilient High-Dimensional Federated Learning

TLDR
This work believes that its is the first Byzantine-resilient algorithm and analysis with local iterations in the presence of malicious/Byzantine clients, and derives convergence results under minimal assumptions of bounded variance for SGD and bounded gradient dissimilarity (which captures heterogeneity among local datasets).

Byzantine-Robust Learning on Heterogeneous Datasets via Resampling

TLDR
This work proposes a simple resampling scheme that adapts existing robust algorithms to heterogeneous datasets at a negligible computational cost and theoretically and experimentally validate the approach, showing that combining resamplings with existing robust algorithm is effective against challenging attacks.

Byzantine-Robust Learning on Heterogeneous Datasets via Bucketing

TLDR
This work proposes a simple bucketing scheme that adapts existing robust algorithms to heterogeneous datasets at a negligible computational cost, and theoretically and experimentally validate the approach, showing that combining bucketing withexisting robust algorithms is effective against challenging attacks.

A simplified convergence theory for Byzantine resilient stochastic gradient descent

Robust Training in High Dimensions via Block Coordinate Geometric Median Descent

TLDR
By applying GM to only a judiciously chosen block of coordinates at a time and using a memory mechanism, one can retain the breakdown point of 1/2 for smooth non-convex problems, with non-asymptotic convergence rates comparable to the SGD with GM while resulting in significant speedup in training.

A Simpli(cid:28)ed Convergence Theory for Byzantine Resilient Stochastic Gradient Descent *

TLDR
A simpli(cid:28)ed convergence theory for the generic Byzantine Resilient SGD method originally proposed by Blanchard et al. is presented.

Learning from History for Byzantine Robust Optimization

TLDR
This work presents two surprisingly simple strategies: a new robust iterative clipping procedure, and incorporating worker momentum to overcome time-coupled attacks, the first provably robust method for the standard stochastic optimization setting.

Byzantine-Resilient Decentralized Stochastic Optimization with Robust Aggregation Rules

TLDR
Following the guidelines, an iterative filtering-based robust aggregation rule termed iterative outlier scissor (IOS) is proposed, which has provable Byzantine-resilience and is shown to be effective in decentralized stochastic optimization.

References

SHOWING 1-10 OF 41 REFERENCES

DRACO: Byzantine-resilient Distributed Training via Redundant Gradients

TLDR
DRACO is presented, a scalable framework for robust distributed training that uses ideas from coding theory and comes with problem-independent robustness guarantees, and is shown to be several times, to orders of magnitude faster than median-based approaches.

Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent

TLDR
Krum is proposed, an aggregation rule that satisfies the resilience property of the aggregation rule capturing the basic requirements to guarantee convergence despite f Byzantine workers, which is argued to be the first provably Byzantine-resilient algorithm for distributed SGD.

Sparsified SGD with Memory

TLDR
This work analyzes Stochastic Gradient Descent with k-sparsification or compression (for instance top-k or random-k) and shows that this scheme converges at the same rate as vanilla SGD when equipped with error compensation.

signSGD: compressed optimisation for non-convex problems

TLDR
SignSGD can get the best of both worlds: compressed gradients and SGD-level convergence rate, and the momentum counterpart of signSGD is able to match the accuracy and convergence speed of Adam on deep Imagenet models.

Communication-Efficient and Byzantine-Robust Distributed Learning

TLDR
It is shown that, in the regime when the compression factor δ is constant and the dimension of the parameter space is fixed, the rate of convergence is not affected by the compression operation, and hence the algorithm effectively gets the compression for free.

Data Encoding Methods for Byzantine-Resilient Distributed Optimization

TLDR
A sparse encoding scheme which enables computationally efficient data encoding and works as efficiently in the streaming data setting as it does in the offline setting, in which all the data is available beforehand.

Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates

TLDR
A main result of this work is a sharp analysis of two robust distributed gradient descent algorithms based on median and trimmed mean operations, respectively, which are shown to achieve order-optimal statistical error rates for strongly convex losses.

Robust Estimators in High Dimensions without the Computational Intractability

TLDR
This work obtains the first computationally efficient algorithms for agnostically learning several fundamental classes of high-dimensional distributions: a single Gaussian, a product distribution on the hypercube, mixtures of two product distributions (under a natural balancedness condition), and k Gaussians with identical spherical covariances.

Securing Distributed Gradient Descent in High Dimensional Statistical Learning

TLDR
A secured variant of the gradient descent method that can tolerate up to a constant fraction of Byzantine workers, and establishes a uniform concentration of the sample covariance matrix of gradients, and shows that the aggregated gradient, as a function of model parameter, converges uniformly to the true gradient function.

Qsparse-Local-SGD: Distributed SGD With Quantization, Sparsification, and Local Computations

TLDR
This paper proposes Qsparse-local-SGD algorithm, which combines aggressive sparsification with quantization and local computation along with error compensation, by keeping track of the difference between the true and compressed gradients, and demonstrates that it converges at the same rate as vanilla distributed SGD for many important classes of sparsifiers and quantizers.