The Big Data Deluge in Biology: Challenges and Solutions

  title={The Big Data Deluge in Biology: Challenges and Solutions},
  author={Sarkar Rr},
  journal={Global Journal of Technology and Optimization},
  • Sarkar Rr
  • Published 2016
  • Biology
  • Global Journal of Technology and Optimization
Over the years, continuous efforts in understanding the complex multi-step processes at different levels have transformed biology from a qualitative to a more quantitative subject, resulting an enormous generation of information at diverse scales from molecular/genome to ecological as well as epidemiological/clinical. Biological data is highly overrepresented with respect to its quantity, diversity and analysis. Advances in high throughput experimental techniques have expanded the lengths and… 
2 Citations

Figures and Tables from this paper

BIOPYDB: A Dynamic Human Cell Specific Biochemical Pathway Database with Advanced Computational Analyses Platform
BIOPYDB offers both the experimental and computational biologists to acquire a comprehensive understanding of signaling cascades in the cells and is designed to make it more acceptable and attractive to the users of pathway research communities.


Exploiting Big Biology: Integrating Large-scale Biological Data for Function Inference
This review discusses the most pertinent functional data for genome-wide functional inference and describes several methods by which these disparate data types are being integrated.
Big Biological Data: Challenges and Opportunities
Big Data: Astronomical or Genomical?
Estimates show that genomics is a “four-headed beast”—it is either on par with or the most demanding of the domains analyzed here in terms of data acquisition, storage, distribution, and analysis.
Comparison of human cell signaling pathway databases—evolution, drawbacks and challenges
A thorough review on popular and actively functioning 24 cell signaling databases is performed to identify some novel and useful features, which are not yet included in any of the databases but also highlights their current limitations and subsequently propose the reasonable solutions for future database development, which could be useful to the whole scientific community.
ArrayExpress update—simplifying data submissions
The main development over the last two years has been the release of a new data submission tool Annotare, which has reduced the average submission time almost 3-fold and will become the only submission route into ArrayExpress, alongside MAGE-TAB format-based pipelines in the near future.
GenBank® is a comprehensive database that contains publicly available nucleotide sequences for over 340 000 formally described species and integrates these records with a variety of other data including taxonomy nodes, genomes, protein structures, and biomedical journal literature in PubMed.
Big data and the future of ecology
The need for sound ecological science has escalated alongside the rise of the information age and “big data” across all sectors of society. Big data generally refer to massive volumes of data not
PaxDb, a Database of Protein Abundance Averages Across All Three Domains of Life*
This work introduces a meta-resource dedicated to integrating information on absolute protein abundance levels, and places particular emphasis on deep coverage, consistent post-processing and comparability across different organisms.
Computational solutions to large-scale data management and analysis
How to master the different types of computational environments that exist — such as cloud and heterogeneous computing — to successfully tackle the authors' big data problems is discussed.
The Protein Data Bank
The goals of the PDB are described, the systems in place for data deposition and access, how to obtain further information and plans for the future development of the resource are described.