Reproducible Research: A Bioinformatics Case Study

@article{Gentleman2005ReproducibleRA,
  title={Reproducible Research: A Bioinformatics Case Study},
  author={Robert Gentleman},
  journal={Statistical Applications in Genetics and Molecular Biology},
  year={2005},
  volume={4}
}
  • R. Gentleman
  • Published 11 January 2005
  • Computer Science
  • Statistical Applications in Genetics and Molecular Biology
While scientific research and the methodologies involved have gone through substantial technological evolution the technology involved in the publication of the results of these endeavors has remained relatively stagnant. Publication is largely done in the same manner today as it was fifty years ago. Many journals have adopted electronic formats, however, their orientation and style is little different from a printed document. The documents tend to be static and take little advantage of… 

Figures and Tables from this paper

DataPackageR: Reproducible data preprocessing, standardization and sharing using R/Bioconductor for collaborative data analysis
A central tenet of reproducible research is that scientific results are published along with the underlying data and software code necessary to reproduce and verify the findings. A host of tools and
DataPackageR: Reproducible data preprocessing, standardization and sharing using R/Bioconductor for collaborative data analysis
A central tenet of reproducible research is that scientific results are published along with the underlying data and software code necessary to reproduce and verify the findings. A host of tools and
Reproducible Research Concepts and Tools for Cancer Bioinformatics
TLDR
There is every indication that reproducible discipline is feasible for microarray studies, and reliability of inferences in cancer bioinformatics will be enhanced if commitments to concrete reproducibility are broadly accepted in the research community.
DataPackageR: Reproducible data preprocessing, standardization and sharing using R/Bioconductor for collaborative data analysis
A central tenet of reproducible research is that scientific results are published along with the underlying data and software code necessary to reproduce and verify the findings. A host of tools and
Statistical Analyses and Reproducible Research
TLDR
This article describes a software framework for both authoring and distributing integrated, dynamic documents that contain text, code, data, and any auxiliary content needed to recreate the computations in data analyses, methodological descriptions, simulations, and so on.
Methodology capture: discriminating between the "best" and the rest of community practice
TLDR
It is proposed that the practice of expert authors from the field of evolutionary biology is the closest to contemporary best practice in phylogenetic experimental design, and should also acknowledge the differences between fields such as the specific context of the analysis.
Advantages and Limits in the Adoption of Reproducible Research and R-Tools for the Analysis of Omic Data
TLDR
The benefits that scientific community can receive from the adoption of Reproducible Research standards in the analysis of high-throughput omic data are illustrated and several tools useful to researchers to increase the reproducibility of their works are described.
PyPedia: using the wiki paradigm as crowd sourcing environment for bioinformatics protocols
TLDR
PyPedia demonstrates how wiki can provide a collaborative development, sharing and even execution environment for biologists and bioinformaticians that complement existing resources, useful for local and multi-center research teams.
RA: ResearchAssistant for the computational sciences
TLDR
The design and implementation of RA are presented, and it is shown how RA easily scales to make complex experiments repeatable.
ReportingTools: an automated result processing and presentation toolkit for high-throughput genomic analyses
TLDR
ReportingTools, a Bioconductor package, is presented, that automatically recognizes and transforms the output of many commonBioconductor packages into rich, interactive, HTML-based reports, which can be easily customized for specific applications using the well-defined API.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 20 REFERENCES
Statistical Analyses and Reproducible Research
TLDR
This article describes a software framework for both authoring and distributing integrated, dynamic documents that contain text, code, data, and any auxiliary content needed to recreate the computations in data analyses, methodological descriptions, simulations, and so on.
Document-centered Computing: Compound Document Editors as User Interfaces
TLDR
This paper abstracts from this implementation and uses the conceptual framework of compound documents for a systematic and implementation independent review of such a user model for mathematical computing software.
Auditing of Data Analyses
TLDR
The facility demonstrates that the verification process is possible and computationally reasonable, even for quite large analyses, and that interactive exploration of the audited analyses presents some interesting and extremely challenging problems.
Sweave: Dynamic Generation of Statistical Reports Using Literate Data Analysis
TLDR
Sweave combines typesetting with LATEX and data anlysis with S into integrated statistical documents that can be automatically updated if data or analysis change, which allows truly reproducible research.
Reproducibility of SELDI-TOF protein patterns in serum: comparing datasets from different experiments
TLDR
Concerns suggest that much of the structure uncovered in SELDI proteomic spectra from serum experiments could be due to artifacts of sample processing, not to the underlying biology of cancer.
Emacs Speaks Statistics: A Multiplatform, Multipackage Development Environment for Statistical Analysis
TLDR
Essex Speaks Statistics (ESS) provides an intelligent and consistent interface between the user and statistics software that understands the syntax for numerous data analysis languages, provides consistent display and editing features across packages, and assists in the interactive or batch execution of statements by statistics packages.
Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data
TLDR
Different discrimination methods for the classification of tumors based on gene expression data include nearest-neighbor classifiers, linear discriminant analysis, and classification trees, which are applied to datasets from three recently published cancer gene expression studies.
Class prediction and discovery using gene expression data
TLDR
A method for performing class prediction is described and illustrated by correctly classifying bone marrow and blood samples from acute leukemia patients, and it is demonstrated how this technique could have discovered the key distinctions among leukemias if they were not already known.
Gentleman R: R: A language for data analysis and graphics
TLDR
The experience designing and implementing a statistical computing language that provides advantages in the areas of portability, computational efficiency, memory management, and scoping is discussed.
Literate Programming
TLDR
This anthology of essays from the inventor of literate programming also contains excerpts from the programs for TEX and METAFONT and CWEB, a system for Literate programming in C and related languages.
...
1
2
...