Corpus ID: 7191087

Disjunctive Normal Form Schemes for Heterogeneous Attributed Graphs

  title={Disjunctive Normal Form Schemes for Heterogeneous Attributed Graphs},
  author={Mayank Kejriwal},
Several 'edge-discovery' applications over graph-based data models are known to have worst-case quadratic complexity, even if the discovered edges are sparse. One example is the generic link discovery problem between two graphs, which has invited research interest in several communities. Specific versions of this problem include link prediction in social networks, ontology alignment between metadata-rich RDF data, approximate joins, and entity resolution between instance-rich data. As large… Expand
MinoanER: Schema-Agnostic, Non-Iterative, Massively Parallel Resolution of Web Entities
The MinoanER framework is proposed, which simultaneously fulfills full automation, support of highly heterogeneous entities, and massive parallelization of the ER process, and leverages a token-based similarity of entities to define a new metric that derives the similarity of neighboring entities from the most important relations. Expand


Link mining: a survey
While network analysis has been studied in depth in particular areas such as social network analysis, hypertext mining, and web analysis, only recently has there been a cross-fertilization of ideas among these different communities. Expand
An unsupervised instance matcher for schema-free RDF data
An unsupervised system that performs instance matching between entities in schema-free Resource Description Framework (RDF) files that automatically generates its own heuristic training set and is shown to compete effectively with adaptive supervised approaches. Expand
A Blocking Framework for Entity Resolution in Highly Heterogeneous Information Spaces
This paper systemize blocking methods for clean-clean ER (an inherently quadratic task) over highly heterogeneous information spaces (HHIS) through a novel framework that consists of two orthogonal layers: the effectiveness layer encompasses methods for building overlapping blocks with small likelihood of missed matches and the efficiency layer comprises a rich variety of techniques that significantly restrict the required number of pairwise comparisons. Expand
Structure and attribute index for approximate graph matching in large graphs
This paper proposes a novel structure-aware and attribute-aware index to process approximate graph matching in a large attributed graph and builds an index on the similarity of the attributed graph by partitioning the large search space into smaller subgraphs based on structure similarity and attribute similarity. Expand
Adaptive Blocking: Learning to Scale Up Record Linkage
This paper introduces an adaptive framework for automatically learning blocking functions that are efficient and accurate, and describes two predicate-based formulations of learnable blocking functions and provides learning algorithms for training them. Expand
A time-efficient hybrid approach to link discovery
This paper presents a novel hybrid approach to link discovery that combines two very fast algorithms by using original insights on the translation of complex link specifications to combinations of atomic specifications via a series of operations on sets and filters. Expand
Stochastic Relational Models for Discriminative Link Prediction
A Gaussian process (GP) framework, stochastic relational models (SRM), for learning social, physical, and other relational phenomena where interactions between entities are observed is introduced and extensions of SRM to general relational learning tasks are discussed. Expand
An Unsupervised Algorithm for Learning Blocking Schemes
An unsupervised method for learning a blocking scheme for tabular data sets is developed and is compared to a state-of-the-art supervised blocking key discovery algorithm on three real-world databases and achieves favorable results. Expand
Adoption of the Linked Data Best Practices in Different Topical Domains
It is found that the number of linked datasets has approximately doubled between 2011 and 2014, that there is increased agreement on common vocabularies for describing certain types of entities, and that provenance and license metadata is still rarely provided by the data sources. Expand
Learning Blocking Schemes for Record Linkage
This paper presents a machine learning approach to automatically learn effective blocking schemes and validate the approach with experiments that show the learned blocking schemes outperform the ad-hoc blocking schemes of non-experts and perform comparably to those manually built by a domain expert. Expand