• Corpus ID: 220363838

Estimating Extreme Value Index by Subsampling for Massive Datasets with Heavy-Tailed Distributions

  title={Estimating Extreme Value Index by Subsampling for Massive Datasets with Heavy-Tailed Distributions},
  author={Yongxin Li and Liuju Chen and Deyuan Li and Hansheng Wang},
  journal={arXiv: Methodology},
Modern statistical analyses often encounter datasets with massive sizes and heavy-tailed distributions. For datasets with massive sizes, traditional estimation methods can hardly be used to estimate the extreme value index directly. To address the issue, we propose here a subsampling-based method. Specifically, multiple subsamples are drawn from the whole dataset by using the technique of simple random subsampling with replacement. Based on each subsample, an approximate maximum likelihood… 

Figures and Tables from this paper


Optimal Subsampling for Large Sample Logistic Regression
A two-step algorithm is developed to efficiently approximate the maximum likelihood estimate in logistic regression and derive optimal subsampling probabilities that minimize the asymptotic mean squared error of the resultant estimator.
Adaptive Huber Regression
A sharp phase transition is established for robust estimation of regression parameters in both low and high dimensions: when, the estimator admits a sub- Gaussian-type deviation bound without sub-Gaussian assumptions on the data, while only a slower rate is available in the regime and the transition is smooth and optimal.
A Subsampled Double Bootstrap for Massive Data
A new resampling method, the subsampled double bootstrap, is proposed, which is superior to BLB in terms of running time, more sample coverage, and automatic implementation with less tuning parameters for a given time budget.
Optimal subsampling for quantile regression in big data
We investigate optimal subsampling for quantile regression. We derive the asymptotic distribution of a general subsampling estimator and then derive two versions of optimal subsampling
A scalable bootstrap for massive data
The ‘bag of little bootstraps’ (BLB) is introduced, which is a new procedure which incorporates features of both the bootstrap and subsampling to yield a robust, computationally efficient means of assessing the quality of estimators.
A statistical perspective on algorithmic leveraging
This work provides an effective framework to evaluate the statistical properties of algorithmic leveraging in the context of estimating parameters in a linear regression model and shows that from the statistical perspective of bias and variance, neither leverage-based sampling nor uniform sampling dominates the other.
Estimation of the generalized extreme-value distribution by the method of probability-weighted moments
We use the method of probability-weighted moments to derive estimators of the parameters and quantiles of the generalized extreme-value distribution. We investigate the properties of these estimators
Tail Index Regression
In extreme value statistics, the tail index is an important measure to gauge the heavy-tailed behavior of a distribution. Under Pareto-type distributions, we employ the logarithmic function to link
Aggregated estimating equation estimation
A computation and storage efficient algorithm for estimating equation (EE) estimation in massive data sets using a “divide-and-conquer” strategy that is strongly consistent and asymptotically equivalent to the EE estimator.
A Comparison of the Stable and Student Distributions as Statistical Models for Stock Prices: Reply
There has been a great deal of discussion about the statistical distribution of rates of return on common stocks. At an early stage the prevalent belief was that distributions of rates of return on