Sustainable data and metadata management at the BD2K-LINCS Data Coordination and Integration Center

@article{Stathias2018SustainableDA,
  title={Sustainable data and metadata management at the BD2K-LINCS Data Coordination and Integration Center},
  author={Vasileios Stathias and Amar Koleti and Dusica Vidovic and Daniel J. Cooper and Kathleen M. Jagodnik and Raymond Terryn and Michele Forlin and Caty Chung and Denis Torre and Nagi G. Ayad and Mario Medvedovic and Avi Ma’ayan and Ajay Pillai and Stephan C. Sch{\"u}rer},
  journal={Scientific Data},
  year={2018},
  volume={5}
}
The NIH-funded LINCS Consortium is creating an extensive reference library of cell-based perturbation response signatures and sophisticated informatics tools incorporating a large number of perturbagens, model systems, and assays. To date, more than 350 datasets have been generated including transcriptomics, proteomics, epigenomics, cell phenotype and competitive binding profiling assays. The large volume and variety of data necessitate rigorous data standards and effective data management… 
LINCS Data Portal 2.0: next generation access point for perturbation-response signatures
TLDR
The cornerstone of this update has been the decision to reprocess all high-level LINCS datasets and make them accessible at the data point level enabling users to directly access and download any subset of signatures across the entire library independent from the originating source, project or assay.
Mining data and metadata from the gene expression omnibus
TLDR
This work reviews methodologies developed to facilitate the systematic curation and processing of publicly available gene expression datasets from GEO, identifies trends for advanced metadata curation, and summarizes approaches for reprocessing the data within the entire GEO repository.
Improving the Utility of the Tox21 Dataset by Deep Metadata Annotations and Constructing Reusable Benchmarked Chemical Reference Signatures
TLDR
The importance of data standards in reporting screening results and high-quality annotations to enable re-use and interpretation of these data are demonstrated.
Emerging Bioinformatics Methods and Resources in Drug Toxicology.
TLDR
This work reviews databases containing toxicogenomics and chemical-phenotype information, as well as appropriated bioinformatics approaches that are currently used to analyze such data, and presents an overview of suitable tools available for a best practice of drug safety analysis.
Informatics Approaches for Harmonized Intelligent Integration of Stem Cell Research
TLDR
Three broad sets of functional features that provide utility for future stem cell research and facilitate bioinformatics workflows were identified and consisted of common data elements, data visualization and analysis tools, and biomedical ontologies for data integration.
SCCAA_A_237361 1..20
TLDR
A scoping review of peer-reviewed literature and online resources to identify and review available stem cell databases identified three broad sets of functional features that provide utility for future stem cell research and facilitate bioinformatics workflows.
Community Approaches for Integrating Environmental Exposures into Human Models of Disease
TLDR
A preliminary semantic data model is presented that will facilitate the inclusion of exposure data in computational analysis of human disease and use cases and competency questions for further community-driven model development and refinement are presented.
The Collaborative Metadata Repository (CoMetaR) Web App: Quantitative and Qualitative Usability Evaluation.
TLDR
This study aims to provide a metadata management app with high usability that assists scientists in compiling and using rich metadata and can be adapted to evaluate apps within the medical informatics field and potentially beyond.
The Collaborative Metadata Repository (CoMetaR) Web App: Quantitative and Qualitative Usability Evaluation (Preprint)
TLDR
This study aims to provide a metadata management app with high usability that assists scientists in compiling and using rich metadata and can be adapted to evaluate apps within the medical informatics field and potentially beyond.
...
...

References

SHOWING 1-10 OF 60 REFERENCES
The center for expanded data annotation and retrieval
The Center for Expanded Data Annotation and Retrieval is studying the creation of comprehensive and expressive metadata for biomedical datasets to facilitate data discovery, data interpretation, and
Metadata Standard and Data Exchange Specifications to Describe, Model, and Integrate Complex and Diverse High-Throughput Screening Data from the Library of Integrated Network-based Cellular Signatures (LINCS)
The National Institutes of Health Library of Integrated Network-based Cellular Signatures (LINCS) program is generating extensive multidimensional data sets, including biochemical, genome-wide
Data Portal for the Library of Integrated Network-based Cellular Signatures (LINCS) program: integrated access to diverse large-scale cellular perturbation response data
TLDR
TheLINCS Data Portal (LDP) is described, a unified web interface to access datasets generated by the LINCS DSGCs, and its underlying database, LINCS Data Registry (LDR).
Evolving BioAssay Ontology (BAO): modularization, integration and applications
TLDR
The evolution of BAO related to its formal structures, engineering approaches, and content is described to enable modeling of complex assays and integration with other ontologies and datasets to enable effective integration, aggregation, retrieval, and analyses of drug screening data.
Formalization, Annotation and Analysis of Diverse Drug and Probe Screening Assay Datasets Using the BioAssay Ontology (BAO)
TLDR
The BioAssay Ontology has been developed with a focus on data integration and analysis enabling the classification of assays and screening results by concepts that relate to format, assay design, technology, target, and endpoint and offers the potential to infer new knowledge from a corpus of assay results.
BioSharing: curated and crowd-sourced metadata standards, databases and data policies in the life sciences
TLDR
This article describes BioSharing, a manually curated, searchable portal of three linked registries that harnesses community curation to collate and cross-reference resources across the life sciences from around the world, with a particular focus on community-led curation.
Identifiers.org and MIRIAM Registry: community resources to provide persistent identification
TLDR
This work describes here the new parallel identification scheme and the updated supporting software infrastructure, and introduces the new Identifiers.org service, which provides directly resolvable identifiers, in the form of Uniform Resource Locators (URLs).
NCBI GEO: archive for functional genomics data sets—update
TLDR
The Gene Expression Omnibus is an international public repository for high-throughput microarray and next-generation sequence functional genomic data sets submitted by the research community and supports archiving of raw data, processed data and metadata which are indexed, cross-linked and searchable.
Ontological representation, integration, and analysis of LINCS cell line cells and their cellular responses
TLDR
This work demonstrated how to ontologically model LINCS cellular signatures such as their non-tumorigenic epithelial cell type, three-dimensional growth, latrunculin-A-induced actin depolymerization and apoptosis, and cell line transfection.
...
...