• Corpus ID: 250644355

ReBoot: Distributed statistical learning via refitting Bootstrap samples

  title={ReBoot: Distributed statistical learning via refitting Bootstrap samples},
  author={Yumeng Wang and Ziwei Zhu},
In this paper, we study a one-shot distributed learning algorithm via refitting Bootstrap samples, which we refer to as ReBoot. Given the local models that are fit on multiple independent subsamples, ReBoot refits a new model on the union of the Bootstrap samples drawn from these local models. The whole procedure requires only one round of communication of model parameters. Theoretically, we analyze the statistical rate of ReBoot for generalized linear models (GLM) and noisy phase retrieval, which… 

Figures and Tables from this paper



Bootstrap Model Aggregation for Distributed Statistical Learning

This work proposes two variance reduction methods to correct the bootstrap noise, including a weighted M-estimator that is both statistically efficient and practically powerful.

On the Optimality of Averaging in Distributed Statistical Learning

In this paper, focusing on empirical risk minimization, or equivalently M-estimation, the statistical error incurred by this strategy is studied, and it is proved that to leading order averaging is as accurate as the centralized solution.

A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers

A unified framework for establishing consistency and convergence rates for regularized M-estimators under high-dimensional scaling is provided and one main theorem is state and shown how it can be used to re-derive several existing results, and also to obtain several new results.

Distributed Estimation, Information Loss and Exponential Families

A simple communication-efficient learning framework that first calculates the local maximum likelihood estimates (MLE) based on the data subsets, and then combines the local MLEs to achieve the best possible approximation to the global MLE given the whole dataset is studied.

Distributed Inference for Linear Support Vector Machine

A multi-round distributed linear-type (MDL) estimator for conducting inference for linear SVM and establishes the Bahadur representation of the estimator, which shows that the MDL estimator achieves the optimal statistical efficiency, i.e., the same efficiency as the classicallinear SVM applying to the entire data set in a single machine setup.

A distributed one-step estimator

A one-step approach to enhance a simple-averaging based distributed estimator by utilizing a single Newton–Raphson updating is proposed and the corresponding asymptotic properties of the newly proposed estimator are derived.

Communication-efficient Sparse Regression

A communication-efficient approach to distributed sparse regression in the high-dimensional setting and a new parallel and computationally-efficient algorithm to compute the approximate inverse covariance required in the debiasing approach, when the dataset is split across samples.

Efficient Distributed Learning with Sparsity

This work proposes a novel, efficient approach for distributed sparse learning in high-dimensions, where observations are randomly partitioned across machines, and provably matches the estimation error bound of centralized methods within constant rounds of communications.

Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion, and Blind Deconvolution

By marrying statistical modeling with generic optimization theory, a general recipe for analyzing the trajectories of iterative algorithms via a leave-one-out perturbation argument is developed, establishing that gradient descent achieves near-optimal statistical and computational guarantees without explicit regularization.

Communication-Efficient Learning of Deep Networks from Decentralized Data

This work presents a practical method for the federated learning of deep networks based on iterative model averaging, and conducts an extensive empirical evaluation, considering five different model architectures and four datasets.