• Corpus ID: 208637300

Outlier detection and a tail-adjusted boxplot based on extreme value theory

@article{Bhattacharya2019OutlierDA,
  title={Outlier detection and a tail-adjusted boxplot based on extreme value theory},
  author={Shrijita Bhattacharya and Jan Beirlant},
  journal={arXiv: Methodology},
  year={2019}
}
Whether an extreme observation is an outlier or not, depends strongly on the corresponding tail behaviour of the underlying distribution. We develop an automatic, data-driven method to identify extreme tail behaviour that deviates from the intermediate and central characteristics. This allows for detecting extreme outliers or sets of extreme data that show less spread than the bulk of the data. To this end we extend a testing method proposed in Bhattacharya et al 2019 for the specific case of… 
1 Citations
Prediction of Type II Diabetes Risk Based on XGBoost and 1D-CNN
In this paper, machine learning method is used to accurately predict the blood glucose level of the real physical examination data in a Grade-A Tertiary Hospital in China. The original data is

References

SHOWING 1-10 OF 15 REFERENCES
Data-adaptive trimming of the Hill estimator and detection of outliers in the extremes of heavy-tailed data
We introduce a trimmed version of the Hill estimator for the index of a heavy-tailed distribution, which is robust to perturbations in the extreme order statistics. In the ideal Pareto setting, the
An adjusted boxplot for skewed distributions
TLDR
An adjustment of the boxplot is presented that includes a robust measure of skewness in the determination of the whiskers, which results in a more accurate representation of the data and of possible outliers.
A robust estimator for the tail index of Pareto-type distributions
TLDR
A robust estimator of the tail index is proposed, by combining a refinement of the Pareto approximation for the conditional distribution of relative excesses over a large threshold with an integrated squared error approach on partial density component estimation.
Tail Index Estimation, Pareto Quantile Plots, and Regression Diagnostics
Abstract Successful application of extreme value statistics for estimating the Pareto tail index relies heavily on the choice of the number of extreme values taken into account. It is shown that
Statistics of Extremes: Theory and Applications
Research in the statistical analysis of extreme values has flourished over the past decade: new probability models, inference and data analysis techniques have been introduced; and new application
Extreme value theory : an introduction
This treatment of extreme value theory is unique in book literature in that it focuses on some beautiful theoretical results along with applications. All the main topics covering the heart of the
Clustering of Maxima: Spatial Dependencies among Heavy Rainfall in France
TLDR
A novel algorithm based on taking advantage of multivariate extreme value theory, a well-developed research field in probability, and to adapt it to the context of spatial clustering is proposed.
Statistical Models in S
TLDR
The interactive data analysis and graphics language S has become a popular environment for both data analysts and research statisticians, but a common complaint has concerned the lack of statistical modeling tools, such as those provided by GLIM© or GENSTAT©.
A similarity-based QSAR model for predicting acute toxicity towards the fathead minnow (Pimephales promelas)
TLDR
A dataset of 908 chemicals was used to develop a QSAR model to predict the LC50 96 hours for the fathead minnow, which had good and balanced performance in internal and external validation, at the expense of a percentage of molecules outside the applicability domain.
Exploratory data analysis
  • J. Tukey
  • Psychology, Computer Science
    Addison-Wesley series in behavioral science : quantitative methods
  • 1977
...
1
2
...