• Corpus ID: 4535973

Bioschemas: From Potato Salad to Protein Annotation

  title={Bioschemas: From Potato Salad to Protein Annotation},
  author={Alasdair J. G. Gray and Carole A. Goble and Rafael C. Jimenez},
The life sciences have a wealth of data resources with a wide range of overlapping content. Key repositories, such as UniProt for protein data or Entrez Gene for gene data, are well known and their content easily discovered through search engines. However, there is a long-tail of bespoke datasets with important content that are not so prominent in search results. Building on the success of Schema.org for making a wide range of structured web content more discoverable and interpretable, e.g… 
Generating molecular entities as structured data
This paper introduces three open-source tools for generating cheminformatics structured data on the Web and presents an effective solution based on the Bioschemas project for that inconveniences.
Biotea-2-Bioschemas, facilitating structured markup for semantically annotated scholarly publications
This work presents the proposed contribution to Bioschemas (from the project “Biotea”), which supports metadata contributions for scholarly publications via profiles and web components, and provides metadata profiles tailored to the Life Sciences domain.
GenoSurf: metadata driven semantic search system for integrated genomic datasets
GenoSurf is implemented, a multi-ontology semantic search system providing access to a consolidated collection of metadata attributes found in the most relevant genomic datasets; values of 10 attributes are semantically enriched by making use of the most suited available ontologies.
The iPPI-DB initiative: a community-centered database of protein–protein interaction modulators
The new version iPPI-DB is presented, the authors' manually curated database of PPI modulators, in this completely redesigned version, which introduces a new web interface relying on crowdsourcing for the maintenance of the database.
RepeatsDB in 2021: improved data and extended classification for protein tandem repeat structures
The major conceptual change compared to the previous version is the hierarchical classification combining top levels based solely on structural similarity with two new levels requiring sequence similarity and describing repeat motifs in collaboration with Pfam.
META-BASE: a Novel Architecture for Large-Scale Genomic Metadata Integration.
META-BASE is described, an architecture for integrating metadata extracted from a variety of genomic data sources, based upon a structured transformation process, and a general, open and extensible pipeline that can easily incorporate any number of new data sources is proposed.
The metaRbolomics Toolbox in Bioconductor and beyond
This review provides an extensive overview of existing packages in R for different steps in a typical computational metabolomics workflow, including data processing, biostatistics, metabolite annotation and identification, and biochemical network and pathway analysis.
MobiDB: intrinsically disordered proteins in 2021
The new MobiDB version presents state-of-the-art knowledge on disordered proteins and improves data accessibility for both computational and experimental users.
Harmonizing semantic annotations for computational models in biology
The landscape of current annotation practices among the COmputational Modeling in BIology NEtwork community is reported and a set of recommendations for building a consensus approach to semantic annotation are provided.
Towards FAIRer Biological Knowledge Networks Using a Hybrid Linked Data and Graph Database Approach
This paper presents the developments to connect, search and share data about genome-scale knowledge networks (GSKN) and demonstrates how the proposed ontology and graph databases considerably improve search and access to interoperable and reusable biological knowledge (i.e. the FAIRness data principles).


Gene: a gene-centered information resource at NCBI
The National Center for Biotechnology Information's (NCBI) Gene database integrates gene-specific information from multiple data sources and represents the integration of curation and automated processing from RefSeq, collaborating model organism databases, consortia such as Gene Ontology, and other databases within NCBI.
Big data makes common schemas even more necessary .
  • 2016