Scalable Simple Random Sampling and Stratified Sampling

@inproceedings{Meng2013ScalableSR,
  title={Scalable Simple Random Sampling and Stratified Sampling},
  author={Xiangrui Meng},
  booktitle={ICML},
  year={2013}
}
Analyzing data sets of billions of records has now become a regular task in many companies and institutions. In the statistical analysis of those massive data sets, sampling generally plays a very important role. In this work, we describe a scalable simple random sampling algorithm, named ScaSRS, which uses probabilistic thresholds to decide on the fly whether to accept, reject, or wait-list an item independently of others. We prove, with high probability, it succeeds and needs only O( √ k… CONTINUE READING