StatsDB: platform-agnostic storage and understanding of next generation sequencing run metrics

Abstract

Modern sequencing platforms generate enormous quantities of data in ever-decreasing amounts of time. Additionally, techniques such as multiplex sequencing allow one run to contain hundreds of different samples. With such data comes a significant challenge to understand its quality and to understand how the quality and yield are changing across instruments and over time. As well as the desire to understand historical data, sequencing centres often have a duty to provide clear summaries of individual run performance to collaborators or customers. We present StatsDB, an open-source software package for storage and analysis of next generation sequencing run metrics. The system has been designed for incorporation into a primary analysis pipeline, either at the programmatic level or via integration into existing user interfaces. Statistics are stored in an SQL database and APIs provide the ability to store and access the data while abstracting the underlying database design. This abstraction allows simpler, wider querying across multiple fields than is possible by the manual steps and calculation required to dissect individual reports, e.g. "provide metrics about nucleotide bias in libraries using adaptor barcode X, across all runs on sequencer A, within the last month". The software is supplied with modules for storage of statistics from FastQC, a commonly used tool for analysis of sequence reads, but the open nature of the database schema means it can be easily adapted to other tools. Currently at The Genome Analysis Centre (TGAC), reports are accessed through our LIMS system or through a standalone GUI tool, but the API and supplied examples make it easy to develop custom reports and to interface with other packages.

DOI: 10.12688/f1000research.2-248.v2

Extracted Key Phrases

7 Figures and Tables

01020201520162017
Citations per Year

Citation Velocity: 5

Averaging 5 citations per year over the last 3 years.

Learn more about how we calculate this metric in our FAQ.

Cite this paper

@inproceedings{RamirezGonzalez2013StatsDBPS, title={StatsDB: platform-agnostic storage and understanding of next generation sequencing run metrics}, author={Ricardo H Ramirez-Gonzalez and Richard M. Leggett and Darren N Waite and Anil S. Thanki and Nizar Drou and Mario Caccamo and Robert P. Davey and Cyriac Kandoth and Anuj Kumar and Mick Watson}, booktitle={F1000Research}, year={2013} }