# Estimating Extreme Value Index by Subsampling for Massive Datasets with Heavy-Tailed Distributions

@article{Li2020EstimatingEV, title={Estimating Extreme Value Index by Subsampling for Massive Datasets with Heavy-Tailed Distributions}, author={Yongxin Li and Liuju Chen and Deyuan Li and Hansheng Wang}, journal={arXiv: Methodology}, year={2020} }

Modern statistical analyses often encounter datasets with massive sizes and heavy-tailed distributions. For datasets with massive sizes, traditional estimation methods can hardly be used to estimate the extreme value index directly. To address the issue, we propose here a subsampling-based method. Specifically, multiple subsamples are drawn from the whole dataset by using the technique of simple random subsampling with replacement. Based on each subsample, an approximate maximum likelihood…

## References

SHOWING 1-10 OF 35 REFERENCES

Optimal Subsampling for Large Sample Logistic Regression

- Mathematics, Computer ScienceJournal of the American Statistical Association
- 2018

A two-step algorithm is developed to efficiently approximate the maximum likelihood estimate in logistic regression and derive optimal subsampling probabilities that minimize the asymptotic mean squared error of the resultant estimator.

Adaptive Huber Regression

- Computer ScienceJournal of the American Statistical Association
- 2020

A sharp phase transition is established for robust estimation of regression parameters in both low and high dimensions: when, the estimator admits a sub- Gaussian-type deviation bound without sub-Gaussian assumptions on the data, while only a slower rate is available in the regime and the transition is smooth and optimal.

A Subsampled Double Bootstrap for Massive Data

- Computer Science
- 2015

A new resampling method, the subsampled double bootstrap, is proposed, which is superior to BLB in terms of running time, more sample coverage, and automatic implementation with less tuning parameters for a given time budget.

Optimal subsampling for quantile regression in big data

- MathematicsBiometrika
- 2020

We investigate optimal subsampling for quantile regression. We derive the asymptotic distribution of a general subsampling estimator and then derive two versions of optimal subsampling…

A scalable bootstrap for massive data

- Computer Science
- 2011

The ‘bag of little bootstraps’ (BLB) is introduced, which is a new procedure which incorporates features of both the bootstrap and subsampling to yield a robust, computationally efficient means of assessing the quality of estimators.

A statistical perspective on algorithmic leveraging

- Computer ScienceJ. Mach. Learn. Res.
- 2015

This work provides an effective framework to evaluate the statistical properties of algorithmic leveraging in the context of estimating parameters in a linear regression model and shows that from the statistical perspective of bias and variance, neither leverage-based sampling nor uniform sampling dominates the other.

Estimation of the generalized extreme-value distribution by the method of probability-weighted moments

- Mathematics
- 1985

We use the method of probability-weighted moments to derive estimators of the parameters and quantiles of the generalized extreme-value distribution. We investigate the properties of these estimators…

Tail Index Regression

- Mathematics
- 2009

In extreme value statistics, the tail index is an important measure to gauge the heavy-tailed behavior of a distribution. Under Pareto-type distributions, we employ the logarithmic function to link…

Aggregated estimating equation estimation

- Computer Science, Mathematics
- 2011

A computation and storage efficient algorithm for estimating equation (EE) estimation in massive data sets using a “divide-and-conquer” strategy that is strongly consistent and asymptotically equivalent to the EE estimator.

A Comparison of the Stable and Student Distributions as Statistical Models for Stock Prices: Reply

- Economics
- 1974

There has been a great deal of discussion about the statistical distribution of rates of return on common stocks. At an early stage the prevalent belief was that distributions of rates of return on…