# A scalable bootstrap for massive data

@article{Kleiner2011ASB, title={A scalable bootstrap for massive data}, author={Ariel Kleiner and Ameet S. Talwalkar and Purnamrita Sarkar and Michael I. Jordan}, journal={Journal of the Royal Statistical Society: Series B (Statistical Methodology)}, year={2011}, volume={76} }

The bootstrap provides a simple and powerful means of assessing the quality of estimators. However, in settings involving large data sets—which are increasingly prevalent—the calculation of bootstrap‐based quantities can be prohibitively demanding computationally. Although variants such as subsampling and the m out of n bootstrap can be used in principle to reduce the cost of bootstrap computations, these methods are generally not robust to specification of tuning parameters (such as the number…

## 321 Citations

The Big Data Bootstrap

- Computer ScienceICML
- 2012

The Bag of Little Bootstraps (BLB), a new procedure which incorporates features of both the bootstrap and subsampling to obtain a robust, computationally efficient means of assessing estimator quality, is presented.

A Subsampled Double Bootstrap for Massive Data

- Computer Science
- 2015

A new resampling method, the subsampled double bootstrap, is proposed, which is superior to BLB in terms of running time, more sample coverage, and automatic implementation with less tuning parameters for a given time budget.

SFB 823 A subsampled double bootstrap for massive data

- Computer Science
- 2015

A new resampling method, the subsampled double bootstrap, is proposed, which is superior to BLB in terms of running time, more sample coverage and automatic implementation with less tuning parameters for a given time budget.

Fast and robust bootstrap in analysing large multivariate datasets

- Computer Science2014 48th Asilomar Conference on Signals, Systems and Computers
- 2014

The proposed bootstrap method facilitates using highly robust statistical methods in analyzing large scale data sets with significant savings in computation since the method does not require recomputing the estimator for each bootstrap sample but it is done analytically using a smart approximation.

Robust, Scalable, and Fast Bootstrap Method for Analyzing Large Scale Data

- Computer ScienceIEEE Transactions on Signal Processing
- 2016

This paper proposes a scalable, statistically robust and computationally efficient bootstrap method, compatible with distributed processing and storage systems and demonstrates scalability, low complexity and robust statistical performance of the method in analyzing large data sets.

Scalable Statistical Inference Using Distributed Bootstrapping And Iterative ℓ1-Norm Minimization

- Computer Science2018 52nd Asilomar Conference on Signals, Systems, and Computers
- 2018

This paper proposes a scalable distributed boot- strap method that uses iterative estimation equations favoring sparse solution and gives smaller Root MSE and significantly lower bias than bootstrap employing widely used sparse estimator BPDN.

Hyperparameter Selection for Subsampling Bootstraps

- Mathematics, Computer Science
- 2020

A hyperparameter selection methodology is developed, which can be used to select tuning parameters for subsampling methods and finds an analytically simple and elegant relationship between the asymptotic efficiency of various subsampled estimators and their hyperparameters.

Sparsity-promoting bootstrap method for large-scale data

- Computer Science2016 50th Asilomar Conference on Signals, Systems and Computers
- 2016

A scalable nonparametric bootstrap method that operates with smaller number of distinct data points on multiple disjoint subsets of data and is compatible with distributed storage systems and distributed and parallel processing architectures is proposed.

A Cheap Bootstrap Method for Fast Inference

- Computer Science
- 2022

This work presents a bootstrap methodology that uses minimal computation, namely with a resample effort as low as one Monte Carlo replication, while maintaining desirable statistical guarantees.

Variable Selection with Scalable Bootstrap in Generalized Linear Model for Massive Data

- Computer Science
- 2016

This paper proposes the method of Variable Selection with Bag of Little Bootstraps (BLBVS) on General Linear Regression and extends it to Generalized Linear Model for selecting important parameters and assessing the quality of estimators' computation efficiency by analyzing results of multiple bootstrap sub-samples.

## References

SHOWING 1-10 OF 33 REFERENCES

Richardson Extrapolation and the Bootstrap

- Mathematics
- 1988

Abstract Simulation methods [particularly Efron's (1979) bootstrap] are being applied more and more frequently in statistical inference. Given data (X 1 …, Xn ) distributed according to P, which…

More Efficient Bootstrap Computations

- Mathematics, Economics
- 1990

Abstract This article concerns computational methods for the bootstrap that are more efficient than the straightforward Monte Carlo methods usually used. The bootstrap is considered in its simplest…

The Jackknife and the Bootstrap for General Stationary Observations

- Mathematics
- 1989

We extend the jackknife and the bootstrap method of estimating standard errors to the case where the observations form a general stationary sequence. We do not attempt a reduction to i.i.d. values.…

The stationary bootstrap

- Mathematics
- 1994

Abstract This article introduces a resampling procedure called the stationary bootstrap as a means of calculating standard errors of estimators and constructing confidence regions for parameters…

How Many Bootstraps

- Economics, Mathematics
- 1985

This document proposes an adaptive sequential method that estimates the accuracy of the bootstrap based on the current bootstrap samples until the estimated accuracy is high enough.

ON THE CHOICE OF m IN THE m OUT OF n BOOTSTRAP AND CONFIDENCE BOUNDS FOR EXTREMA

- Mathematics
- 2008

For i.i.d. samples of size n, the ordinary bootstrap (Efron (1979)) is known to be consistent in many situations, but it may fail in important examples (Bickel, Gotze and van Zwet (1997)). Using…

A note on methods of restoring consistency to the bootstrap

- Economics, Mathematics
- 2003

We consider the property of consistency and its relevance for determining the performance of the bootstrap. We analyse various parametric bootstrap approximations to the distributions of the Hodges…

Bootstrapping General Empirical Measures

- Mathematics
- 1990

It is proved that the bootstrapped central limit theorem for empirical processes indexed by a class of functions F and based on a probability measure P holds a.s. if and only if F CLT (P ) and ∫ F dP…

Extrapolation and the bootstrap

- Economics
- 2002

The m out of n bootstrap, with or without replacement, where m→∞ and m/n→ 0 has been proposed on two grounds: (i) As a way of ensuring consistency when the classical bootstrap is not consistent. (ii)…

Computer Intensive Methods in Statistics

- Computer Science
- 1994

Four topics that have been treated in more detail were: Bayesian Computing; Interfacing Statistics and Computers; Image Analysis; Resampling Methods.