Onto2Vec: joint vector-based representation of biological entities and their ontology-based annotations

  title={Onto2Vec: joint vector-based representation of biological entities and their ontology-based annotations},
  author={Fatima Zohra Smaili and Xin Gao and R. Hoehndorf},
  pages={i52 - i60}
Motivation Biological knowledge is widely represented in the form of ontology‐based annotations: ontologies describe the phenomena assumed to exist within a domain, and the annotations associate a (kind of) biological entity with a set of phenomena within the domain. [] Key Method Our method can be applied to a wide range of bioinformatics research problems such as similarity‐based prediction of interactions between proteins, classification of interaction types using supervised learning, or clustering. To…

Figures and Tables from this paper

OPA2Vec: combining formal and informal content of biomedical ontologies to improve similarity-based prediction

It is demonstrated that OPA2Vec significantly outperforms existing methods for predicting gene-disease associations and can be used to produce vector representations of any biomedical entity given any type of biomedical ontology.

OntoSem: an Ontology Semantic Representation Methodology for Biomedical Domain

This work proposes a novel distributional semantic representation methodology based on the combination of two pre-trained and domain-specific word embedding tools, the non-contextualized Word2Vec and the context-dependent NCBI-blueBERT, to enhance the encoding ability for biomedical ontologies.

GoVec: Gene Ontology Representation Learning Using Weighted Heterogeneous Graph and Meta-Path

GoVec is presented to produce representations seamlessly for both ontologies and biological entities by utilizing meta-path-based representation learning in the heterogeneous graph and the capability of GoVec representations to embed functional semantics into the vectors is shown.

Evolving knowledge graph similarity for supervised learning in complex biomedical domains

A novel approach, evoKGsim, is developed that applies Genetic Programming over a set of semantic similarity features, each based on a semantic aspect of the data, to obtain the best combination for a given supervised learning task.

OWL2Vec*: Embedding of OWL Ontologies

A random walk and word embedding based ontology embedding method, which encodes the semantics of an OWL ontology by taking into account its graph structure, lexical information and logical constructors.

Predicting Gene-Disease Associations with Knowledge Graph Embeddings over Multiple Ontologies

This work investigates the impact of employing richer semantic representations that are based on more than one ontology, able to represent both genes and diseases and consider multiple kinds of relations within the ontologies.

Machine learning with biomedical ontologies

An overview over the methods that use ontologies to compute similarity and incorporate them in machine learning methods is provided, which outlines how semantic similarity measures and ontology embeddings can exploit the background knowledge in biomedical ontologies, and how ontologies can provide constraints that improve machine learning models.

Supervised Semantic Similarity

This work presents a new approach that uses supervised machine learning to tailor aspect-oriented semantic similarity measures to fit a particular view on biological similarity or relatedness, and demonstrates that this approach outperforms non-supervised methods.

Learning representations for gene ontology terms by jointly encoding graph structure and textual node descriptors

A novel representation model for GO terms is proposed, named GT2Vec, which jointly considers the GO graph structure obtained by graph contrastive learning and the semantic description of GO terms based on BERT encoders and demonstrates the effectiveness of using a joint encoding graph structure and textual node descriptors to learn vector representations for GO Terms.

HiG2Vec: Hierarchical Representations of Gene Ontology and Genes in the Poincaré Ball

H hierarchical representations of GO and genes (HiG2Vec) that apply Poincaré embedding specialized in the representation of hierarchy through a two-step procedure are proposed that are superior to other methods in capturing the GO and gene semantics and in data utilization as well.



Semantic Similarity in Biomedical Ontologies

This work reviews semantic similarity measures applied to biomedical ontologies and proposes their classification according to the strategies they employ: node-based versus edge-based and pairwise versus groupwise.

A gene ontology inferred from molecular networks

It is shown that large networks of gene and protein interactions in Saccharomyces cerevisiae can be used to infer an ontology whose coverage and power are equivalent to those of the manually curated Gene Ontology (GO).

Investigating Semantic Similarity Measures Across the Gene Ontology: The Relationship Between Sequence and Annotation

This paper investigates the use of ontological annotation to measure the similarities in knowledge content or 'semantic similarity' between entries in a data resource and shows a simple extension that enables a semantic search of the knowledge held within sequence databases.

Evaluating the effect of annotation size on measures of semantic similarity

It is found that most similarity measures are sensitive to the number of annotations of entities, difference in annotation size as well as to the depth of annotation classes; well-studied and richly annotated entities will usually show higher similarity than entities with only few annotations even in the absence of any biological relation.

The role of ontologies in biological and biomedical research: a functional perspective

A functional perspective on ontologies in biology and biomedicine is provided, focusing on what ontologies can do and describing how they can be used in support of integrative research.

Neuro-symbolic representation learning on biological knowledge graphs

This work develops a novel method for feature learning on biological knowledge graphs that combines symbolic methods, in particular knowledge representation using symbolic logic and automated reasoning, with neural networks to generate embeddings of nodes that encode for related information within knowledge graphs.

Gene Ontology Annotations and Resources

The Gene Ontology (GO) Consortium is a community-based bioinformatics resource that classifies gene product function through the use of structured, controlled vocabularies and has been expanded not only to cover new areas of biology through focused interaction with experts, but also to capture greater specificity in all areas of the ontology.

Inferring ontology graph structures using OWL reasoning

Onto2Graph is a method to transform ontologies into graphs using an automated reasoner while taking into account all relations between classes, and can be used for improved ontology visualization and ontology-based data analysis.

Ontology-driven similarity approaches to supporting gene func- tional assessment

Alternative techniques for measuring ontology-driven similarity of gene products are discussed and relationships between these types of similarity information and key functional properties, such as gene co-expression, are discussed.

Metrics for GO based protein semantic similarity: a systematic evaluation

A systematic evaluation of GO-based semantic similarity measures using the relationship with sequence similarity as a means to quantify their performance, and assessed the influence of electronic annotations by testing the measures in the presence and absence of these annotations.