Modeling bias and variation in the stochastic processes of small RNA sequencing

  title={Modeling bias and variation in the stochastic processes of small RNA sequencing},
  author={Christos P. Argyropoulos and Alton Etheridge and Nikita A. Sakhanenko and David J. Galas},
  journal={Nucleic Acids Research},
  pages={e104 - e104}
Abstract The use of RNA-seq as the preferred method for the discovery and validation of small RNA biomarkers has been hindered by high quantitative variability and biased sequence counts. In this paper we develop a statistical model for sequence counts that accounts for ligase bias and stochastic variation in sequence counts. This model implies a linear quadratic relation between the mean and variance of sequence counts. Using a large number of sequencing datasets, we demonstrate how one can… 

Figures and Tables from this paper

Analysis and correction of compositional bias in sparse sequencing count data
It is argued that detailed experiments specifically addressing the influence of compositional bias in metagenomics are needed, and an empirical Bayes normalization approach is proposed to overcome this problem.
Fingerprints of Modified RNA Bases from Deep Sequencing Profiles.
The ability of next-generation sequencing (NGS) to detect and distinguish between ten modified bases in synthetic RNAs is tested and patterns are distinct for several of the modifications, suggesting the future use of ultradeep sequencing as a fingerprinting strategy for locating and identifying modifications in cellular RNAs.
Some Statistical and Dynamical Models for the Analysis of Mcrobial Ecosystems and their Genomic Data
By allowing for regulated post-infection activation, CRISPRs can function by exploiting a dual defense strategy of abortive infection and anti-viral resistance, and is theoretically analyze the ecological and evolutionary dynamics of such a costly defense mechanism in simplified models of prokaryote-phage coevolution.
Role of MicroRNAs in Renal Parenchymal Diseases—A New Dimension
The role of miRNAs in normal renal development and physiology, in maladaptive renal repair after injury, and in the pathogenesis of renal parenchymal diseases are presented.
Study protocol: rationale and design of the community-based prospective cohort study of kidney function and diabetes in rural New Mexico, the COMPASS study
The COMPASS study is designed as a community-based program in rural New Mexico aiming to screen for CKD and to discover CKD-related translational biomarkers and to generate novel epigenetic data that are relevant for future studies in the general population.


Why weight? Modelling sample and observational level variability improves power in RNA-seq analyses
A comprehensive analysis involving both simulations and experimental RNA-sequencing data demonstrates that this strategy leads to a universally more powerful analysis and fewer false discoveries when compared to conventional approaches.
A new shrinkage estimator for dispersion improves differential expression detection in RNA-seq data
It is found that the estimated dispersion in existing methods does not adequately capture the heterogeneity of biological variance among samples, so a new empirical Bayes shrinkage estimate of the dispersion parameters is presented and improved DE detection is demonstrated.
Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation
A flexible statistical framework is developed for the analysis of read counts from RNA-Seq gene expression studies, and parallel computational approaches are developed to make non-linear model fitting faster and more reliable, making the application of GLMs to genomic data more convenient and practical.
Removing technical variability in RNA-seq data using conditional quantile normalization
A statistical methodology is described that improves precision by 42% without loss of accuracy and combines robust generalized regression to remove systematic bias introduced by deterministic features such as GC-content and quantile normalization to correct for global distortions.
A powerful and flexible approach to the analysis of RNA sequence count data
BBSeq is described, which incorporates a simple beta-binomial generalized linear model, combined with simple outlier detection and testing approaches, which appears to have favorable characteristics in power and flexibility.
Biases in small RNA deep sequencing data
Recent findings that challenge small non-protein coding RNA-seq data are reviewed and approaches and precautions to overcome or minimize bias are suggested.
Empirical insights into the stochasticity of small RNA sequencing
A simple method for the analysis of RNA sequencing data is provided and showed its superiority to three existing methods for differential expression analysis using three data examples of technical replicate data and biological replicate data.
Identification and remediation of biases in the activity of RNA ligases in small-RNA deep sequencing
By providing a wide spectrum of substrate for the ligase, the pooled-adapter strategy developed here provides a means to overcome issues of bias, and generate more accurate small RNA profiles.
A two-parameter generalized Poisson model to improve the analysis of RNA-seq data
This work proposes a two-parameter generalized Poisson (GP) model to the position-level read counts of RNA-seq data, and shows that the GP model fits the data much better than the traditional Poisson model.
Bias in Ligation-Based Small RNA Sequencing Library Construction Is Determined by Adaptor and RNA Structure
This study investigates the effects of ligation bias by using a pool of randomized ligation substrates, defined mixtures of miRNA sequences and several combinations of adaptors in HTS library construction, and shows that like the 3’ adaptor ligation step, the 5’ adaptation is also biased.