Scalable Simple Random Sampling and Stratified Sampling

  title={Scalable Simple Random Sampling and Stratified Sampling},
  author={Xiangrui Meng},
Analyzing data sets of billions of records has now become a regular task in many companies and institutions. In the statistical analysis of those massive data sets, sampling generally plays a very important role. In this work, we describe a scalable simple random sampling algorithm, named ScaSRS, which uses probabilistic thresholds to decide on the fly whether to accept, reject, or wait-list an item independently of others. We prove, with high probability, it succeeds and needs only O( √ k… CONTINUE READING