OpenBioLink: a benchmarking framework for large-scale biomedical link prediction

@article{Breit2020OpenBioLinkAB,
  title={OpenBioLink: a benchmarking framework for large-scale biomedical link prediction},
  author={Anna Breit and Simon Ott and Asan Agibetov and Matthias Samwald},
  journal={Bioinformatics},
  year={2020}
}
SUMMARY Recently, novel machine-learning algorithms have shown potential for predicting undiscovered links in biomedical knowledge networks. However, dedicated benchmarks for measuring algorithmic progress have not yet emerged. With OpenBioLink, we introduce a large-scale, high-quality and highly challenging biomedical link prediction benchmark to transparently and reproducibly evaluate such algorithms. Furthermore, we present preliminary baseline evaluation results. AVAILABILITY AND… 

Figures from this paper

LinkExplorer: Predicting, explaining and exploring links in large biomedical knowledge graphs
TLDR
LinkExplorer is presented, a software suite for predicting, explaining and exploring links in large biomedical knowledge graphs that integrates the novel, rule-based link prediction engine SAFRAN, which was recently shown to outcompete other explainable algorithms and established black box algorithms.
Scalable and interpretable rule-based link prediction for large heterogeneous knowledge graphs
TLDR
SAFRAN yields new state-of-the-art results for fully interpretable link prediction on the established general-purpose benchmark FB15K-237 and the large-scale biomedical benchmark OpenBioLink and increases inference speeds by up to two orders of magnitude.
DOUBLER: Unified Representation Learning of Biological Entities and Documents for Predicting Protein–Disease Relationships
TLDR
This work proposes a system that learns consistent representations of biological entities based on a knowledge graph and additional data modalities, like structured annotations and free text describing the entities, and uses these representations to identify novel proteins associated with diseases.
Explainable Biomedical Recommendations via Reinforcement Learning Reasoning on Knowledge Graphs
TLDR
The neurosymbolic approach of multi-hop reasoning on knowledge graphs is explored for drug discovery to draw solid conclusions on its applicability and is systematically applied to multiple biomedical datasets and recommendation tasks with fair benchmark comparisons for the first time.
Implications of Topological Imbalance for Representation Learning on Biomedical Knowledge Graphs
TLDR
The results highlight the importance of data modeling choices, and the need for practitioners to be mindful of these issues when interpreting model outputs and during KG composition, and provide support for the observation that KGE models can be more influenced by the frequency of entities rather than any biological information encoded within the relations.
Task-Driven Knowledge Graph Filtering Improves Prioritizing Drugs for Repurposing
TLDR
This work proposes a method that leverages domain knowledge in the form of metapaths and uses them to filter two biomedical knowledge graphs for the purpose of improving performance on the prediction task of drug repurposing while simultaneously increasing computational efficiency.
A Review of Biomedical Datasets Relating to Drug Discovery: A Knowledge Graph Perspective
TLDR
This review presents a comparative analysis of existing public drug discovery KGs and a evaluation of selected motivating case studies from the literature, and raises numerous and unique challenges and issues associated with the domain and its datasets, whilst also highlighting key future research directions.
Causal reasoning over knowledge graphs leveraging drug-perturbed and disease-specific transcriptomic signatures for drug discovery
TLDR
RPath is presented, a novel algorithm that prioritizes drugs for a given disease by reasoning over causal paths in a knowledge graph (KG), guided by both drug-perturbed as well as disease-specific transcriptomic signatures.
Causal reasoning over knowledge graphs leveraging drug-perturbed and disease-specific transcriptomic signatures for drug discovery
TLDR
RPath is presented, a novel algorithm that prioritizes drugs for a given disease by reasoning over causal paths in a knowledge graph (KG), guided by both drug-perturbed as well as disease-specific transcriptomic signatures.
PharmKG: a dedicated knowledge graph benchmark for bomedical data mining.
TLDR
This work introduced PharmKG, a multi-relational, attributed biomedical KG, composed of more than 500 000 individual interconnections between genes, drugs and diseases, with 29 relation types over a vocabulary of ~8000 disambiguated entities, and established a comprehensive KG system for the biomedical field.
...
...

References

SHOWING 1-10 OF 18 REFERENCES
BioKEEN: a library for learning and evaluating biological knowledge graph embeddings
TLDR
This work developed BioKEEN and PyKEEN to facilitate their easy use through an interactive command line interface and presents a case study in which a novel biological pathway mapping resource is used to predict links that represent pathway crosstalks and hierarchies.
Bio2RDF Release 3: A larger, more connected network of Linked Data for the Life Sciences
TLDR
This report reports on a third coordinated release of ∼11 billion triples across 30 biomedical databases and datasets, representing a 10 fold increase in the number of triples since Bio2RDF Release 2 (Jan 2013).
Neural networks for link prediction in realistic biomedical graphs: a multi-dimensional evaluation of graph embedding-based approaches
TLDR
Investigation of how inputs from four node representation algorithms affect performance of a neural link predictor on random- and time-sliced biomedical graphs of real-world sizes containing information relevant to DTI, PPI and LBD showed that neural network methods performed well on links between nodes with no previous common neighbours; potentially the most interesting links.
Heterogeneous Network Edge Prediction: A Data Integration Approach to Prioritize Disease-Associated Genes
TLDR
Heterogeneous network edge prediction effectively prioritized genetic associations and provides a powerful new approach for data integration across multiple domains.
Systematic integration of biomedical knowledge prioritizes drugs for repurposing
TLDR
The ability to computationally predict whether a compound treats a disease would improve the economy and success rate of drug approval and provide pharmacological insights on epilepsy are described, suggesting they will help prioritize drug repurposing candidates.
Convolutional 2D Knowledge Graph Embeddings
TLDR
ConvE, a multi-layer convolutional network model for link prediction, is introduced and it is found that ConvE achieves state-of-the-art Mean Reciprocal Rank across most datasets.
Graph embedding on biomedical networks: methods, applications and evaluations
TLDR
Compared with three state-of-the-art methods for DDAs, DDIs and protein function predictions, the recent graph embedding methods achieve competitive performance without using any biological features and the learned embeddings can be treated as complementary representations for the biological features.
Neuro-symbolic representation learning on biological knowledge graphs
TLDR
This work develops a novel method for feature learning on biological knowledge graphs that combines symbolic methods, in particular knowledge representation using symbolic logic and automated reasoning, with neural networks to generate embeddings of nodes that encode for related information within knowledge graphs.
Translating Embeddings for Modeling Multi-relational Data
TLDR
TransE is proposed, a method which models relationships by interpreting them as translations operating on the low-dimensional embeddings of the entities, which proves to be powerful since extensive experiments show that TransE significantly outperforms state-of-the-art methods in link prediction on two knowledge bases.
Learning Entity and Relation Embeddings for Knowledge Graph Completion
TLDR
TransR is proposed to build entity and relation embeddings in separate entity space and relation spaces to build translations between projected entities and to evaluate the models on three tasks including link prediction, triple classification and relational fact extraction.
...
...