• Corpus ID: 238856649

Byzantine-Robust Learning on Heterogeneous Datasets via Bucketing

@inproceedings{Karimireddy2020ByzantineRobustLO,
  title={Byzantine-Robust Learning on Heterogeneous Datasets via Bucketing},
  author={Sai Praneeth Karimireddy and Lie He and Martin Jaggi},
  year={2020}
}
In Byzantine robust distributed or federated learning, a central server wants to train a machine learning model over data distributed across multiple workers. However, a fraction of these workers may deviate from the prescribed algorithm and send arbitrary messages. While this problem has received significant attention recently, most current defenses assume that the workers have identical data. For realistic cases when the data across workers are heterogeneous (non-iid), we design new attacks… 
Byzantine Machine Learning Made Easy by Resilient Averaging of Momentums
Byzantine resilience emerged as a prominent topic within the distributed machine learning commu-nity. Essentially, the goal is to enhance distributed optimization algorithms, such as distributed SGD,
Byzantine-Robust Federated Learning with Optimal Statistical Rates and Privacy Guarantees
TLDR
It is remarked that the Byzantine-robust federated learning protocols with bucketing can be naturally combined with privacy-guaranteeing procedures to introduce security against a semi-honest server.
Byzantine-Resilient Decentralized Stochastic Optimization with Robust Aggregation Rules
TLDR
Following the guidelines, an iterative filtering-based robust aggregation rule termed iterative outlier scissor (IOS) is proposed, which has provable Byzantine-resilience and is shown to be effective in decentralized stochastic optimization.
zPROBE: Zero Peek Robustness Checks for Federated Learning
Privacy-preserving federated learning allows multiple users to jointly train a model with coordination of a central server. The server only learns the final aggregation result, thereby preventing
Strategyproof Learning: Building Trustworthy User-Generated Datasets
TLDR
This paper proposes the first personalized collaborative learning framework, LICCHAVI, with provable strategyproofness guarantees through a careful design of the underlying loss function, and proves that LICCH AVI is Byzantine resilient: it tolerates a minority of users that provide arbitrary data.
Variance Reduction is an Antidote to Byzantines: Better Rates, Weaker Assumptions and Communication Compression as a Cherry on the Top
TLDR
Theoretical convergence guarantees are derived for Byz-VR-MARINA outperforming previous state-of-the-art for general non-convex and Polyak-Łojasiewicz loss functions and the first analysis of a Byzantine-tolerant method supporting non-uniform sampling of stochastic gradients.
Communication-efficient distributed eigenspace estimation with arbitrary node failures
TLDR
An eigenspace estimation algorithm for distributed environments with arbitrary node failures, where a subset of computing nodes can return structurally valid but otherwise arbitrarily chosen responses, and includes three common forms of node-level corruption that cannot be easily detected by the central machine.
Distributed Newton-Type Methods with Communication Compression and Bernoulli Aggregation
TLDR
This work proves that the recently developed class of three point compressors (3PC) of Richtárik et al. can be generalized to Hessian communication as well, and discovered several new 3PC mechanisms, such as adaptive thresholding and Bernoulli aggregation, which require reduced communication and occasional Hessian computations.

References

SHOWING 1-10 OF 49 REFERENCES
DRACO: Byzantine-resilient Distributed Training via Redundant Gradients
TLDR
DRACO is presented, a scalable framework for robust distributed training that uses ideas from coding theory and comes with problem-independent robustness guarantees, and is shown to be several times, to orders of magnitude faster than median-based approaches.
RSA: Byzantine-Robust Stochastic Aggregation Methods for Distributed Learning from Heterogeneous Datasets
TLDR
This paper shows that RSA converges to a near-optimal solution with the learning error dependent on the number of Byzantine workers, and shows that the convergence rate of RSA under Byzantine attacks is the same as that of the stochastic gradient descent method, which is free of Byzantine attacks.
Byzantine-Robust Decentralized Learning via Self-Centered Clipping
TLDR
A Self-Centered Clipping (SCClip) algorithm for Byzantine-robust consensus and optimization, which is the first to provably converge to a $O(\delta_{\max}\zeta^2/\gamma^2)$ neighborhood of the stationary point for non-convex objectives under standard assumptions.
Byzantine-Resilient SGD in High Dimensions on Heterogeneous Data
TLDR
At the core of the algorithm, the polynomial-time outlier-filtering procedure for robust mean estimation proposed by Steinhardt et al. (ITCS 2018) to filter-out corrupt gradients is used to give a trade-off between the mini-batch size for stochastic gradients and the approximation error.
Robust Federated Learning in a Heterogeneous Environment
TLDR
A general statistical model is proposed which takes both the cluster structure of the users and the Byzantine machines into account and proves statistical guarantees for an outlier-robust clustering algorithm, which can be considered as the Lloyd algorithm with robust estimation.
On the Byzantine Robustness of Clustered Federated Learning
TLDR
This work investigates the application of CFL to byzantine settings, where a subset of clients behaves unpredictably or tries to disturb the joint training effort in an directed or undirected way, and demonstrates that CFL (without modifications) is able to reliably detect byZantine clients and remove them from training.
Byzantine-Resilient High-Dimensional SGD with Local Iterations on Heterogeneous Data
TLDR
This work believes that its is the first Byzantine-resilient algorithm and analysis with local iterations in the presence of malicious/Byzantine clients and derives convergence results under minimal assumptions of bounded variance for SGD and bounded gradient dissimilarity in the statistical heterogeneous data setting.
AGGREGATHOR: Byzantine Machine Learning via Robust Gradient Aggregation
TLDR
A framework that implements state-of-the-art robust (Byzantine-resilient) distributed stochastic gradient descent and quantifies the overhead of Byzantine resilience of AGGREGATHOR to 19% and 43% compared to vanilla TensorFlow.
Learning from History for Byzantine Robust Optimization
TLDR
This work presents two surprisingly simple strategies: a new robust iterative clipping procedure, and incorporating worker momentum to overcome time-coupled attacks, the first provably robust method for the standard stochastic optimization setting.
Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent
TLDR
Krum is proposed, an aggregation rule that satisfies the resilience property of the aggregation rule capturing the basic requirements to guarantee convergence despite f Byzantine workers, which is argued to be the first provably Byzantine-resilient algorithm for distributed SGD.
...
...