A Statistical Framework for the Analysis of ChIP-Seq Data.


Chromatin immunoprecipitation followed by sequencing (ChIP-Seq) has revolutionalized experiments for genome-wide profiling of DNA-binding proteins, histone modifications, and nucleosome occupancy. As the cost of sequencing is decreasing, many researchers are switching from microarray-based technologies (ChIP-chip) to ChIP-Seq for genome-wide study of transcriptional regulation. Despite its increasing and well-deserved popularity, there is little work that investigates and accounts for sources of biases in the ChIP-Seq technology. These biases typically arise from both the standard pre-processing protocol and the underlying DNA sequence of the generated data. We study data from a naked DNA sequencing experiment, which sequences non-cross-linked DNA after deproteinizing and shearing, to understand factors affecting background distribution of data generated in a ChIP-Seq experiment. We introduce a background model that accounts for apparent sources of biases such as mappability and GC content and develop a flexible mixture model named MOSAiCS for detecting peaks in both one- and two-sample analyses of ChIP-Seq data. We illustrate that our model fits observed ChIP-Seq data well and further demonstrate advantages of MOSAiCS over commonly used tools for ChIP-Seq data analysis with several case studies.

10 Figures and Tables

Citations per Year

79 Citations

Semantic Scholar estimates that this publication has 79 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@article{Kuan2011ASF, title={A Statistical Framework for the Analysis of ChIP-Seq Data.}, author={Pei Fen Kuan and Oliana Carnevali and Guangjin Pan and James A. Thomson and Ron M. Stewart and S{\"{u}nd{\"{u}z Keles}, journal={Journal of the American Statistical Association}, year={2011}, volume={106 495}, pages={891-903} }