Measure of Strength of Evidence for Visually Observed Differences between Subpopulations
@inproceedings{Yang2021MeasureOS, title={Measure of Strength of Evidence for Visually Observed Differences between Subpopulations}, author={Xi Yang and Jan Hannig and Katherine A. Hoadley and Iain Carmichael and J. S. Marron}, year={2021} }
An increasingly important data analytic challenge is understanding the relationships between subpopulations. Various visualization methods (PCA, tSNE, UMAP) that provide many useful insights into those relationships are popular, especially in high dimensional contexts such as bioinformatics. While visualization is often in-sightful, it can also be deceptive. This motivates the need for careful assessment of the strength of the evidence for differences between subpopulations. Because…
Figures and Tables from this paper
References
SHOWING 1-10 OF 17 REFERENCES
Direction-Projection-Permutation for High-Dimensional Hypothesis Tests
- Computer Science
- 2013
A computational tool called direction-projection-permutation (DiProPerm) is proposed, which rigorously assesses whether a binary linear classifier is detecting statistically significant differences between two high-dimensional distributions.
Properties of Balanced Permutations
- MathematicsJ. Comput. Biol.
- 2009
It turns out that balanced permutation reference distributions do not have the correct null behavior, which can be traced to their lack of a group structure, and they can give p-values that are too permissive to varying degrees.
Geometric representation of high dimension, low sample size data
- Mathematics
- 2005
This analysis shows a tendency for the data to lie deterministically at the vertices of a regular simplex, which means all the randomness in the data appears only as a random rotation of this simplex.
Fast Algorithms for Large-Scale Generalized Distance Weighted Discrimination
- Computer Science
- 2016
This work designs a scalable and robust algorithm for solving large-scale generalized DWD problems, and sometimes even more efficient than the highly optimized LIBLINEAR and LIBSVM for solving the corresponding SVM problems.
Distance‐weighted discrimination
- Computer Science
- 2015
A useful property of distance‐weighted discrimination, beyond just good classification performance, is that it provides a direction vector in high‐dimensional data space with several purposes, including indication of driving phenomena behind class differences, data visualization, and batch adjustment tasks.
Exact testing with random permutations
- Mathematics, Computer ScienceTest
- 2018
This paper provides an alternative proof, viewing the test as a “conditional Monte Carlo test” as it has been called in the literature, and results can be used to prove properties of various multiple testing procedures based on random permutations.
Continuous Multivariate Distributions
- Mathematics
- 2009
In this article, we present a concise review of developments on various continuous multivariate distributions. We first present some basic definitions and notations. Then, we present several…
Principal Components in Regression Analysis
- Business
- 1986
As illustrated in the other chapters of this book, research continues into a wide variety of methods of using PCA in analysing various types of data. However, in no area has this research been more…
Support-Vector Networks
- Computer ScienceMachine Learning
- 2004
High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated and the performance of the support- vector network is compared to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.
On the Folded Normal Distribution
- Mathematics
- 2013
The characteristic function of the folded normal distribution and its moment function are derived. The entropy of the folded normal distribution and the Kullback–Leibler from the normal and half…