Measuring Relatedness Between Scientific Entities in Annotation Datasets

Abstract

Linked Open Data has made available a diversity of scientific collections where scientists have annotated entities in the datasets with controlled vocabulary terms (CV terms) from ontologies. These semantic annotations encode scientific knowledge which is captured in annotation datasets. One can mine these datasets to discover relationships and patterns between entities. Determining the relatedness (or similarity) between entities becomes a building block for graph pattern mining, e.g., identifying drug-drug relationships could depend on the similarity of the diseases (conditions) that are associated with each drug. Diverse similarity metrics have been proposed in the literature, e.g., i) string-similarity metrics; ii) path-similarity metrics; iii) topological-similarity metrics; all measure relatedness in a given taxonomy or ontology. In this paper, we consider a novel annotation similarity metric AnnSim that measures the relatedness between two entities in terms of the similarity of their annotations. We model AnnSim as a 1-to-1 maximal weighted bipartite match, and we exploit properties of existing solvers to provide an efficient solution. We empirically study the effectiveness of AnnSim on real-world datasets of genes and their GO annotations, clinical trials, and a human disease benchmark. Our results suggest that AnnSim can provide a deeper understanding of the relatedness of concepts and can provide an explanation of potential novel patterns.

DOI: 10.1145/2506583.2506651

Extracted Key Phrases

12 Figures and Tables

Cite this paper

@inproceedings{Palma2013MeasuringRB, title={Measuring Relatedness Between Scientific Entities in Annotation Datasets}, author={Guillermo Palma and Maria-Esther Vidal and Eric Haag and Louiqa Raschid and Andreas Thor}, booktitle={BCB}, year={2013} }