Robust detection and identification of sparse segments in ultrahigh dimensional data analysis

@article{TonyCai2012RobustDA,
  title={Robust detection and identification of sparse segments in ultrahigh dimensional data analysis},
  author={T. Tony Cai and X. Jessie Jeng and Hongzhe Li},
  journal={Journal of the Royal Statistical Society: Series B (Statistical Methodology)},
  year={2012},
  volume={74}
}
Summary.  Copy number variants (CNVs) are alternations of DNA of a genome that result in the cell having less or more than two copies of segments of the DNA. CNVs correspond to relatively large regions of the genome, ranging from about one kilobase to several megabases, that are deleted or duplicated. Motivated by CNV analysis based on next generation sequencing data, we consider the problem of detecting and identifying sparse short segments hidden in a long linear sequence of data with an… 

SPARSE SEGMENT IDENTIFICATIONS WITH APPLICATIONS TO DNA COPY NUMBER VARIATION ANALYSIS

This chapter reviews statistical methods for CNV identification, focusing on latest developed methods for sparse segment identifications in various settings, and focuses on problem formulations and optimal statistical properties of the procedures.

Parametric modeling of whole-genome sequencing data for CNV identification.

This paper considers parametric modeling of the read depth (RD) data from whole-genome sequencing with the aim of identifying the CNVs, including both Poisson and negative-binomial modeling of such count data, and proposes a unified approach of using a mean-matching variance stabilizing transformation.

Detection of Copy Number Variation Regions Using the DNA-Sequencing Data from Multiple Profiles with Correlated Structure

The framework of a fused Lasso latent feature model is used to solve the problem of detecting boundaries of DNA copy number variation (CNV) regions using the DNA-sequencing data from multiple subject samples, and a modified information criterion for selecting the tuning parameter is proposed.

A Statistical Method for Identifying Trait-Associated Copy Number Variants

This work developed a method, CNVtest, to directly identify the trait-associated CNVs without the need of identifying sample-specific CNVs, and demonstrates the methods using simulations and an application to identify the CNVs that are associated with population differentiation.

A robust statistical method for Genome-wide association analysis of human copy number variation

A new robust method to find disease-risking regions related to CNV's disproportionately distributed between case and control samples, even if there are batch effects between them, and a new empirical Bayes rule to deal with overfitting when estimating parameters during testing is developed.

iSeg: an efficient algorithm for segmentation of genomic and epigenomic data

An efficient general-purpose segmentation tool is developed and it is shown that it had comparable or more accurate results than many of the most popular segment-calling algorithms used in contemporary genomic data analysis.

A backward procedure for change‐point detection with applications to copy number variation detection

This article proposes a new change‐point detection method, a backward procedure, which is not only fast and simple enough to exploit high‐dimensional data but also performs very well for detecting short signals.

Statistical Methods for Analysis of Multi-Sample Copy Number Variants and ChIP-seq Data

A novel method, CNVtest, is developed to directly identify the trait-associated CNVs without the need of identifying sample-specific CNVs and controls the Type I error asymptotically and identifies the true trait- associated CNVs with a high probability.

A Super Scalable Algorithm for Short Segment Detection

This paper develops a framework to assign significance levels for detected segments and studies a super scalable short segment (4S) detection algorithm that is computationally efficient and does not rely on Gaussian noise assumption.

A Super Scalable Algorithm for Short Segment Detection

This paper develops a framework to assign significance levels for detected segments and studies a super scalable short segment (4S) detection algorithm that is computationally efficient and does not rely on Gaussian noise assumption.

Optimal Sparse Segment Identification With Application in Copy Number Variation Analysis

An efficient likelihood ratio selection (LRS) procedure for identifying the segments is developed, and the asymptotic optimality of this method is presented in the sense that the LRS can separate the signal segments from the noise as long as the signals are in the identifiable regions.

BreakDancer: An algorithm for high resolution mapping of genomic structural variation

The algorithm BreakDancer predicts a wide variety of structural variants including insertion-deletions (indels), inversions and translocations and sensitively and accurately detected indels ranging from 10 base pairs to 1 megabase pair that are difficult to detect via a single conventional approach.

CNV-seq, a new method to detect copy number variation using high-throughput sequencing

The results show that the number of reads, not the length of the reads is the key factor determining the resolution of detection, which favors the next-generation sequencing methods that rapidly produce large amount of short reads.

High-resolution mapping of copy-number alterations with massively parallel sequencing

A collection of ∼14 million aligned sequence reads from human cell lines has comparable power to detect events as the current generation of DNA microarrays and has over twofold better precision for localizing breakpoints (typically, to within ∼1 kilobase).

rSW-seq: Algorithm for detection of copy number alterations in deep sequencing data

This work develops a method for identification of copy number alterations in a tumor genome compared to its matched control, based on application of Smith-Waterman algorithm to single-end sequencing data.

Accurate and exact CNV identification from targeted high-throughput sequence data

This work presents a method combining coverage with map information for the identification of deletions and duplications in targeted sequence data, providing comprehensive mutation screening when combined with a short read aligner.

Sensitive and accurate detection of copy number variants using read depth of coverage.

The results suggest that analysis of read depth is an effective approach for the detection of CNVs, and it captures structural variants that are refractory to established PEM-based methods.

ReadDepth: A Parallel R Package for Detecting Copy Number Alterations from Short Sequencing Reads

The readDepth package for R is presented, which can detect copy number alterations by measuring the depth of coverage obtained by massively parallel sequencing of the genome, and demonstrates a method for inferring copy number using reads generated by whole-genome bisulfite sequencing, thus enabling integrative study of epigenomic and copy number alteration.

CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing.

By genotyping CNVs in the CEPH, Yoruba, and Chinese-Japanese populations, it is estimated that at least 11% of all CNV loci involve complex, multi-allelic events, a considerably higher estimate than reported earlier.

Global variation in copy number in the human genome

A first-generation CNV map of the human genome is constructed through the study of 270 individuals from four populations with ancestry in Europe, Africa or Asia, underscoring the importance of CNV in genetic diversity and evolution and the utility of this resource for genetic disease studies.