• Publications
  • Influence
Framework for Evaluating Clustering Algorithms in Duplicate Detection
TLDR
In this paper, we use Stringer to evaluate the quality of the clusters (groups of potential duplicates) obtained from several unconstrained clustering algorithms used in concert with approximate join techniques. Expand
  • 197
  • 22
  • PDF
Linked Movie Data Base
TLDR
We present a novel way of creating and maintaining large quantities of high quality links by employing state-of-the-art approximate join techniques for finding links, and providing additional RDF metadata about the quality of the links. Expand
  • 133
  • 14
  • PDF
Toward a complete dataset of drug-drug interaction information from publicly available sources
TLDR
A comprehensive dataset of all publically available sources of PDDI information that could be identified using a comprehensive and broad search were combined into a single dataset. Expand
  • 90
  • 10
Linked open drug data for pharmaceutical research and development
TLDR
In this paper, we present past and ongoing work of Linking Open Drug Data (LODD) and discuss the growing importance of Linked Data as a foundation for pharmaceutical R&D data sharing. Expand
  • 182
  • 9
  • PDF
Schema Management for Document Stores
TLDR
We present a schema management framework for document stores that discovers and persists schemas of JSON records in a repository, and also supports queries and schema summarization. Expand
  • 57
  • 9
  • PDF
Matching Web Tables with Knowledge Base Entities: From Entity Lookups to Entity Embeddings
TLDR
In this paper, we evaluate three unsupervised annotation methods: (a) a lookup-based method which relies on the minimal entity context provided in Web tables to discover correspondences to the KB, (b) a semantic embeddings method that exploits a vectorial representation of the rich entity context in a KB to identify the most relevant subset of entities in the Web table, and (c) an ontology matching method, which exploits schematic and instance information. Expand
  • 47
  • 9
  • PDF
Linking Open Drug Data
TLDR
The development of new therapies for diseases requires the integration of large amounts of biomedical data from many different sources. Expand
  • 69
  • 7
  • PDF
Instance-Based Matching of Large Ontologies Using Locality-Sensitive Hashing
TLDR
In this paper, we describe a mechanism for ontology alignment using instance based matching of types (or classes). Expand
  • 46
  • 7
  • PDF
Creating probabilistic databases from duplicated data
TLDR
We present a flexible modular framework for scalably creating a probabilistic database out of a dirty relation of duplicated data and overview the challenges raised in utilizing this framework for large relations of string data. Expand
  • 86
  • 5
  • PDF
Discovering Linkage Points over Web Data
TLDR
We present a framework consisting of a library of efficient lexical analyzers and similarity functions, and a set of search algorithms for effective and efficient identification of linkage points over Web data. Expand
  • 34
  • 5
  • PDF