# Principled Graph Matching Algorithms for Integrating Multiple Data Sources

@article{Zhang2015PrincipledGM, title={Principled Graph Matching Algorithms for Integrating Multiple Data Sources}, author={Duo Zhang and Benjamin I. P. Rubinstein and Jim Gemmell}, journal={IEEE Transactions on Knowledge and Data Engineering}, year={2015}, volume={27}, pages={2784-2796} }

This paper explores combinatorial optimization for problems of max-weight graph matching on multi-partite graphs, which arise in integrating multiple data sources. In the most common two-source case, it is often desirable for the final matching to be one-to-one; the database and statistical record linkage communities accomplish this by weighted bipartite graph matching on similarity scores. Such matchings are intuitively appealing: they leverage a natural global property of many real-world…

## Figures, Tables, and Topics from this paper

## 17 Citations

Multidimensional Assignment Problem for multipartite entity resolution

- Computer ScienceArXiv
- 2021

This work derives a mathematical formulation for a general class of record linkage problems in multipartite entity resolution across many datasets as a combinatorial optimization problem known as the multidimensional assignment problem, and shows that very large scale search, especially its multi-start version, outperforms simple Greedy heuristic.

InfMatch: Finding isomorphism subgraph on a big target graph based on the importance of vertex

- Computer SciencePhysica A: Statistical Mechanics and its Applications
- 2019

This paper proposes a subgraph matching algorithm based on node influence, denoted as InfMatch, to improve the performance of sub graph matching on a large target graph and proposes several filter strategies according to the characteristics of the method.

Information Recovery in Shuffled Graphs via Graph Matching

- Mathematics, Computer ScienceIEEE Transactions on Information Theory
- 2018

An information theoretic foundation is provided for understanding the practical impact that errorfully observed vertex correspondences can have on subsequent inference, and the capacity of graph matching methods to recover the lost vertex alignment and inferential performance.

A Comparative Study of Subgraph Matching Isomorphic Methods in Social Networks

- Computer ScienceIEEE Access
- 2018

Five typical graph matching methods are chosen to value their performance and scalability in social networks and it is found that VF2 and RI is applicable for rather small graphs while SPath performs better in large graph when average degree of graph is small.

Initialization and Coordinate Optimization for Multi-way Matching

- Mathematics, Computer ScienceAISTATS
- 2017

This work proposes a coordinate update algorithm that directly optimizes the target objective by using pairwise alignment information to build an undirected graph and initializing the permutation matrices along the edges of its Maximum Spanning Tree, which successfully avoids bad local optima.

Structural Constraints for Multipartite Entity Resolution with Markov Logic Network

- Computer ScienceCIKM
- 2015

This work proposes a principled solution to the multipartite entity resolution problem, building on the foundation of Markov Logic Network (MLN) that combines probabilistic graphical model and first-order logic.

Crowdsourced Collective Entity Resolution with Relational Match Propagation

- Computer Science2020 IEEE 36th International Conference on Data Engineering (ICDE)
- 2020

This paper proposes a novel approach called crowdsourced collective ER, which iteratively asks human workers to label picked entity pairs and propagates the labeling information to their neighbors in distance and achieves superior accuracy with much less labeling.

Clink - A Novel Record Linkage Methodology based on Graph Interactions

- Computer ScienceDATA
- 2017

A study on how to calculate similarity score not only based on string similarity techniques or topological graph similarity, but also using graph interactions between nodes to effectively achieve better linkage results is proposed.

Neighborhood-Aware Attentional Representation for Multilingual Knowledge Graphs

- Computer ScienceIJCAI
- 2019

This paper incorporates neighborhood subgraph-level information of entities, and proposes a neighborhood-aware attentional representation method NAEA for multilingual knowledge graphs that significantly and consistently outperforms state-of-the-art entity alignment models.

COTSAE: CO-Training of Structure and Attribute Embeddings for Entity Alignment

- Computer ScienceAAAI
- 2020

This work proposes COTSAE that combines the structure and attribute information of entities by co-training two embedding learning components, respectively, and proposes a joint attention method in the model to learn the attentions of attribute types and values cooperatively.

## References

SHOWING 1-10 OF 44 REFERENCES

Improving Entity Resolution with Global Constraints

- Computer ScienceArXiv
- 2011

This paper investigates another socio-economic property that has not yet been exploited: sites that create lists of entities, such as IMDB and Netix, have an incentive to avoid gratuitous duplicates, and finds that this property is leveraged to resolve entities across the dierent web sites, and that it can obtain substantial improvements in resolution accuracy.

Constraint-Based Entity Matching

- Computer ScienceAAAI
- 2005

A novel combination of EM and relaxation labeling algorithms that efficiently learns the model, thereby matching mentions in an unsupervised way, without the need for annotated training data, and that the solution scales up to large data sets.

Max-Product for Maximum Weight Matching: Convergence, Correctness, and LP Duality

- Mathematics, Computer ScienceIEEE Transactions on Information Theory
- 2008

This paper proves the correctness and convergence of max-product for finding the maximum weight matching (MWM) in bipartite graphs and provides a bound on the number of iterations required and it is shown that for a graph of size n, the computational cost of the algorithm scales as O(n3), which is the same as the computationalcost of the best known algorithms forFinding the MWM.

Scaling multiple-source entity resolution using statistically efficient transfer learning

- Computer ScienceCIKM
- 2012

This work addresses the prohibitive cost of labeling training data for supervised learning of similarity scores for each pair of sources with a brand new transfer learning algorithm which requires far less training data and achieves superior accuracy with the same data and is trained using fast convex optimization.

A Hierarchical Graphical Model for Record Linkage

- Computer Science, MathematicsUAI
- 2004

This paper describes a hierarchical graphical model framework for the record-linkage problem in an unsupervised setting, and proposes new methods to minimize overfitting and describes a method for incorporating monotonicity constraints in a graphical model.

Exploiting context analysis for combining multiple entity resolution systems

- Computer ScienceSIGMOD Conference
- 2009

A new ER Ensemble framework that employs two novel combining approaches, which are based on supervised learning, to combine the results of multiple base-level ER systems into a single solution with the goal of increasing the quality of ER.

Approximation algorithms for three-dimensional assignment problems with triangle inequalities

- 2002

The three-dimensional assignment problem (3DA) is defined as follows. Given are three disjoint n-sets of points, and nonnegative costs associated with every triangle consisting of exactly one point…

Record Matching over Query Results from Multiple Web Databases

- Computer ScienceIEEE Transactions on Knowledge and Data Engineering
- 2010

An unsupervised, online record matching method, UDD, which can effectively identify duplicates from the query result records of multiple Web databases, and Experimental results show that UDD works well for the Web database scenario where existing supervised methods do not apply.

On the Complexity of Approximating k-Dimensional Matching

- Mathematics, Computer ScienceRANDOM-APPROX
- 2003

It is proved that k-DM cannot be efficiently approximated to within a factor of O(\frac{k}{ \ln k}) unless P = NP, and NP-hardness factors of 4-DM, 5-DM and 6-DM are proved.

A Latent Dirichlet Model for Unsupervised Entity Resolution

- Computer ScienceSDM
- 2006

This work proposes a novel sampling algorithm for collective entity resolution which is unsupervised and also takes entity relations into account, and demonstrates the utility and practicality of the relational entity resolution approach for author resolution in two real-world bibliographic datasets.