• Corpus ID: 254246754

Measure of Strength of Evidence for Visually Observed Differences between Subpopulations

  title={Measure of Strength of Evidence for Visually Observed Differences between Subpopulations},
  author={Xi Yang and Jan Hannig and Katherine A. Hoadley and Iain Carmichael and J. S. Marron},
An increasingly important data analytic challenge is understanding the relationships between subpopulations. Various visualization methods (PCA, tSNE, UMAP) that provide many useful insights into those relationships are popular, especially in high dimensional contexts such as bioinformatics. While visualization is often in-sightful, it can also be deceptive. This motivates the need for careful assessment of the strength of the evidence for differences between subpopulations. Because… 

Figures and Tables from this paper



Direction-Projection-Permutation for High-Dimensional Hypothesis Tests

A computational tool called direction-projection-permutation (DiProPerm) is proposed, which rigorously assesses whether a binary linear classifier is detecting statistically significant differences between two high-dimensional distributions.

Properties of Balanced Permutations

It turns out that balanced permutation reference distributions do not have the correct null behavior, which can be traced to their lack of a group structure, and they can give p-values that are too permissive to varying degrees.

Geometric representation of high dimension, low sample size data

This analysis shows a tendency for the data to lie deterministically at the vertices of a regular simplex, which means all the randomness in the data appears only as a random rotation of this simplex.

Fast Algorithms for Large-Scale Generalized Distance Weighted Discrimination

This work designs a scalable and robust algorithm for solving large-scale generalized DWD problems, and sometimes even more efficient than the highly optimized LIBLINEAR and LIBSVM for solving the corresponding SVM problems.

Distance‐weighted discrimination

A useful property of distance‐weighted discrimination, beyond just good classification performance, is that it provides a direction vector in high‐dimensional data space with several purposes, including indication of driving phenomena behind class differences, data visualization, and batch adjustment tasks.

Exact testing with random permutations

This paper provides an alternative proof, viewing the test as a “conditional Monte Carlo test” as it has been called in the literature, and results can be used to prove properties of various multiple testing procedures based on random permutations.

Continuous Multivariate Distributions

In this article, we present a concise review of developments on various continuous multivariate distributions. We first present some basic definitions and notations. Then, we present several

Principal Components in Regression Analysis

As illustrated in the other chapters of this book, research continues into a wide variety of methods of using PCA in analysing various types of data. However, in no area has this research been more

Support-Vector Networks

High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated and the performance of the support- vector network is compared to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.

On the Folded Normal Distribution

The characteristic function of the folded normal distribution and its moment function are derived. The entropy of the folded normal distribution and the Kullback–Leibler from the normal and half