A Super Scalable Algorithm for Short Segment Detection
@article{Hao2020ASS, title={A Super Scalable Algorithm for Short Segment Detection}, author={Ning Hao and Yue Niu and Feifei Xiao and Heping Zhang}, journal={Statistics in Biosciences}, year={2020}, volume={13}, pages={18-33} }
In many applications such as copy number variant (CNV) detection, the goal is to identify short segments on which the observations have different means or medians from the background. Those segments are usually short and hidden in a long sequence and hence are very challenging to find. We study a super scalable short segment (4S) detection algorithm in this paper. This nonparametric method clusters the locations where the observations exceed a threshold for segment detection. It is…
One Citation
Detecting and Evaluating Dust‐Events in North China With Ground Air Quality Data
- Environmental ScienceEarth and Space Science
- 2021
We propose a dust‐event detection and tracking procedure based on air quality data from the ground monitoring network by detecting temporal and spatial change‐points in PM10 concentration. It…
References
SHOWING 1-10 OF 19 REFERENCES
Robust detection and identification of sparse segments in ultrahigh dimensional data analysis
- Computer ScienceJournal of the Royal Statistical Society. Series B, Statistical methodology
- 2012
This work proposes a computationally efficient method that provides a robust and near optimal solution for segment identification over a wide range of noise distributions and theoretically quantify the conditions for detecting the segment signals and shows that the method near optimally estimates the signal segments whenever it is possible to detect their existence.
Optimal Sparse Segment Identification With Application in Copy Number Variation Analysis
- Computer ScienceJournal of the American Statistical Association
- 2010
An efficient likelihood ratio selection (LRS) procedure for identifying the segments is developed, and the asymptotic optimality of this method is presented in the sense that the LRS can separate the signal segments from the noise as long as the signals are in the identifiable regions.
Modified screening and ranking algorithm for copy number variation detection
- BiologyBioinform.
- 2015
The modified SaRa method improves SaRa theoretically and numerically, for identifying CNVs with high-throughput genotyping data, and achieves better performance than the circular binary segmentation (CBS) method.
Wild binary segmentation for multiple change-point detection
- Computer Science
- 2014
This work proposes a new technique, called wild binary segmentation (WBS), for consistent estimation of the number and locations of multiple change-points in data, and proposes two stopping criteria for WBS: one based on thresholding and the other based on what is termed the `strengthened Schwarz information criterion'.
THE SCREENING AND RANKING ALGORITHM TO DETECT DNA COPY NUMBER VARIATIONS.
- Computer ScienceThe annals of applied statistics
- 2012
This study proposes the Screening and Ranking algorithm (SaRa) which can detect CNVs fast and accurately with complexity down to O(n), and characterize theoretical properties and present numerical analysis for the algorithm.
Multiple Change-Point Detection via a Screening and Ranking Algorithm.
- Computer ScienceStatistica Sinica
- 2013
A false discovery rate approach to the multiple change-point problem is developed and a strong sure coverage property for the SaRa is shown, showing its superiority over other commonly used algorithms.
Multiple Change-point Detection: a Selective Overview
- Computer Science
- 2015
This article provides an in-depth discussion on a normal mean change-point model from aspects of regression analysis, hypothesis testing, consistency and inference, and presents a strategy to gather and aggregate local information for change- point detection that has become the cornerstone of several emerging methods.
Near-optimal detection of geometric objects by fast multiscale methods
- Computer ScienceIEEE Transactions on Information Theory
- 2005
A general approach to detectors for "geometric" objects in noisy data is described, which covers several classes of geometrically defined signals, and allows for asymptotically optimal detection thresholds and fast algorithms for near-optimal detectors.
Spatial smoothing and hot spot detection for CGH data using the fused lasso.
- Computer ScienceBiostatistics
- 2008
The fused lasso criterion leads to a convex optimization problem, and a fast algorithm is provided for its solution, which generally outperforms competing methods for calling gains and losses in CGH data.
Circular binary segmentation for the analysis of array-based DNA copy number data.
- BiologyBiostatistics
- 2004
A modification ofbinary segmentation is developed, which is called circular binary segmentation, to translate noisy intensity measurements into regions of equal copy number in DNA sequence copy number.