• Publications
  • Influence
XORing Elephants: Novel Erasure Codes for Big Data
TLDR
A novel family of erasure codes that are efficiently repairable and offer higher reliability compared to Reed-Solomon codes are presented, which provides higher reliability, which is orders of magnitude higher compared to replication.
Speeding Up Distributed Machine Learning Using Codes
TLDR
This paper focuses on two of the most basic building blocks of distributed learning algorithms: matrix multiplication and data shuffling, and uses codes to reduce communication bottlenecks, exploiting the excess in storage.
Federated Learning with Matched Averaging
TLDR
This work proposes Federated matched averaging (FedMA) algorithm designed for federated learning of modern neural network architectures e.g. convolutional neural networks (CNNs) and LSTMs and indicates that FedMA outperforms popular state-of-the-art federatedLearning algorithms on deep CNN and L STM architectures trained on real world datasets, while improving the communication efficiency.
Locally Repairable Codes
TLDR
This paper explores the repair metric of locality, which corresponds to the number of disk accesses required during a single node repair, and shows the existence of optimal locally repairable codes (LRCs) that achieve this tradeoff.
Perturbed Iterate Analysis for Asynchronous Stochastic Optimization
TLDR
Using the perturbed iterate framework, this work provides new analyses of the Hogwild! algorithm and asynchronous stochastic coordinate descent, that are simpler than earlier analyses, remove many assumptions of previous models, and in some cases yield improved upper bounds on the convergence rates.
Interference Alignment as a Rank Constrained Rank Minimization
TLDR
In many cases the proposed algorithm attains perfect interference alignment and in some cases outperforms previous approaches for finding precoding and receive matrices for interference alignment.
ATOMO: Communication-efficient Learning via Atomic Sparsification
TLDR
ATOMO is presented, a general framework for atomic sparsification of stochastic gradients and it is shown that methods such as QSGD and TernGrad are special cases of ATOMO and sparsifiying gradients in their singular value decomposition (SVD) can lead to significantly faster distributed training.
DRACO: Byzantine-resilient Distributed Training via Redundant Gradients
TLDR
DRACO is presented, a scalable framework for robust distributed training that uses ideas from coding theory and comes with problem-independent robustness guarantees, and is shown to be several times, to orders of magnitude faster than median-based approaches.
Locality and Availability in Distributed Storage
TLDR
It is shown that it is possible to construct codes that can support a scaling number of parallel reads while keeping the rate to be an arbitrarily high constant, and that this is possible with the minimum Hamming distance arbitrarily close to the Singleton bound.
Speeding up distributed machine learning using codes
TLDR
This work views distributed machine learning algorithms through a coding-theoretic lens, and shows how codes can equip them with robustness against system noise, including stragglers and matrix multiplication.
...
...