MSB: a mean-shift-based approach for the analysis of structural variation in the genome.

@article{Wang2009MSBAM,
  title={MSB: a mean-shift-based approach for the analysis of structural variation in the genome.},
  author={Lu-Yong Wang and Alexej Abyzov and Jan O. Korbel and Michael Snyder and Mark B. Gerstein},
  journal={Genome research},
  year={2009},
  volume={19 1},
  pages={
          106-17
        }
}
Genome structural variation includes segmental duplications, deletions, and other rearrangements, and array-based comparative genomic hybridization (array-CGH) is a popular technology for determining this. Drawing relevant conclusions from array-CGH requires computational methods for partitioning the chromosome into segments of elevated, reduced, or unchanged copy number. Several approaches have been described, most of which attempt to explicitly model the underlying distribution of data based… 

Figures from this paper

Nucleotide-resolution analysis of structural variants using BreakSeq and a breakpoint library
TLDR
A library of breakpoints at nucleotide resolution from collating and standardizing ~2,000 published SVs is assembled, finding that the occurrence of insertions and deletions is more balanced than previously reported and that NAHR-formed breakpoints are associated with relatively rigid, stable DNA helices.
Copynumber: Efficient algorithms for single- and multi-track copy number segmentation
TLDR
The R package copynumber is a software suite for segmentation of single- and multi-track copy number data using algorithms based on coherent least squares principles.
A Genome-Wide Analysis of Array-Based Comparative Genomic Hybridization (CGH) Data to Detect Intra-Species Variations and Evolutionary Relationships
TLDR
A novel method of genome-wide comparison and classification using CGH data that condenses whole genome information, aimed at quantification of intra-species variations and discovery of shared ancestry is proposed and successfully detects existing intra-specific variations with additional evolutionary implications.
Transcriptional landscape estimation from tiling array data using a model of signal shift and drift
TLDR
A new methodology based on a hidden Markov model that embeds the segmentation of a continuous-valued signal in a probabilistic setting and permits retrieving more information than a unique segmentation by giving access to the whole probability distribution of the transcription profile.
Effect of Noise on Estimates of Stepwise Changes in Genome DNA Chromosomal Systems
TLDR
Estimates of the CNVs combined with the bounds proposed may play a crucial role for medical experts to make decisions about true chromosomal changes and even their existence.
AGE: defining breakpoints of genomic structural variants at single-nucleotide resolution, through optimal alignments with gap excision
TLDR
A dynamic-programming algorithm, called AGE for Alignment with Gap Excision, finds the optimal solution by simultaneously aligning the 5′ and 3′ ends of two given sequences and introducing a ‘large-gap jump’ between the local end alignments to maximize the total alignment score.
Statistical Methods for the Analysis of Copy Number Variation
TLDR
Computational approaches based on NGS data have been proposed and applied to specific genomic loci and an integrated framework of read-depth and split-read based approaches was developed to pinpoint breakpoints of CNV events occurring across samples.
Computational tools for copy number variation (CNV) detection using next-generation sequencing data: features and perspectives
TLDR
The recent advances in computational methods pertaining to CNV detection using whole genome and whole exome sequencing data are reviewed to discuss their strengths and weaknesses and suggest directions for future development.
Analysis of variable retroduplications in human populations suggests coupling of retrotransposition to cell division.
TLDR
It is shown that cell division is coupled to retrotransposition and, perhaps, is even a requirement for it, and that a correct phylogenetic tree of human subpopulations based solely on retroduplications can be reconstructed.
...
...

References

SHOWING 1-10 OF 44 REFERENCES
A statistical approach for array CGH data analysis
TLDR
It is demonstrated that existing methods for estimating the number of segments are not well adapted in the case of array CGH data, and an adaptive criterion is proposed that detects previously mapped chromosomal aberrations.
Systematic prediction and validation of breakpoints associated with copy-number variants in the human genome
TLDR
An iterative, “active” approach to initially scoring with a preliminary model, performing targeted validations, retraining the model, and then rescoring, and a flexible parameterization system that intuitively collapses from a full model of 2,503 parameters to a core one of only 10 enable the study of CNV population frequencies.
Continuous-index hidden Markov modelling of array CGH copy number data
TLDR
A continuous-index hidden Markov model for aCGH data as well as a Monte Carlo EM algorithm to estimate its parameters are described and it is shown that for a dataset from the BT-474 cell line analysed on 32K BAC tiling microarrays, this model yields considerably better model fit compared to a discrete-index HMM.
Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data
TLDR
11 different algorithms for analyzing array CGH data are compared, based on diverse techniques such as mixture models, Hidden Markov Models, maximum likelihood, regression, wavelets and genetic algorithms, to reveal general characteristics that are helpful to the biological investigator.
A faster circular binary segmentation algorithm for the analysis of array CGH data
TLDR
A hybrid approach to obtain the P-value of the test statistic in linear time is presented and it is shown that the substantial gain in speed with only a negligible loss in accuracy and that the stopping rule further increases speed.
Paired-End Mapping Reveals Extensive Structural Variation in the Human Genome
TLDR
High-throughput and massive paired-end mapping (PEM) was used to map SVs in an African and in a putatively European individual and identified shared and divergent SVs relative to the reference genome, documenting that the number of SVs among humans is much larger than initially hypothesized; many of the SVs potentially affect gene function.
Accurate detection of aneuploidies in array CGH and gene expression microarray data
TLDR
ChARM (Chromosomal Aberration Region Miner), a robust and accurate expectation-maximization based method for identification of segmental aneuploidies (partial chromosome changes) from gene expression and array CGH microarray data, is presented.
A method for calling gains and losses in array CGH data.
TLDR
A new algorithm 'Cluster along chromosomes' (CLAC) is proposed for the analysis of array CGH data, which builds hierarchical clustering-style trees along each chromosome arm (or chromosome), and then selects the 'interesting' clusters by controlling the False Discovery Rate (FDR) at a certain level.
...
...