# Principled Graph Matching Algorithms for Integrating Multiple Data Sources

@article{Zhang2015PrincipledGM,
title={Principled Graph Matching Algorithms for Integrating Multiple Data Sources},
author={Duo Zhang and Benjamin I. P. Rubinstein and Jim Gemmell},
journal={IEEE Transactions on Knowledge and Data Engineering},
year={2015},
volume={27},
pages={2784-2796}
}
• Published 2 February 2014
• Computer Science, Mathematics
• IEEE Transactions on Knowledge and Data Engineering
This paper explores combinatorial optimization for problems of max-weight graph matching on multi-partite graphs, which arise in integrating multiple data sources. In the most common two-source case, it is often desirable for the final matching to be one-to-one; the database and statistical record linkage communities accomplish this by weighted bipartite graph matching on similarity scores. Such matchings are intuitively appealing: they leverage a natural global property of many real-world…
Multidimensional Assignment Problem for multipartite entity resolution
• Computer Science
ArXiv
• 2021
This work derives a mathematical formulation for a general class of record linkage problems in multipartite entity resolution across many datasets as a combinatorial optimization problem known as the multidimensional assignment problem, and shows that very large scale search, especially its multi-start version, outperforms simple Greedy heuristic.
InfMatch: Finding isomorphism subgraph on a big target graph based on the importance of vertex
• Computer Science
Physica A: Statistical Mechanics and its Applications
• 2019
This paper proposes a subgraph matching algorithm based on node influence, denoted as InfMatch, to improve the performance of sub graph matching on a large target graph and proposes several filter strategies according to the characteristics of the method.
Information Recovery in Shuffled Graphs via Graph Matching
• V. Lyzinski
• Mathematics, Computer Science
IEEE Transactions on Information Theory
• 2018
An information theoretic foundation is provided for understanding the practical impact that errorfully observed vertex correspondences can have on subsequent inference, and the capacity of graph matching methods to recover the lost vertex alignment and inferential performance.
A Comparative Study of Subgraph Matching Isomorphic Methods in Social Networks
• Computer Science
IEEE Access
• 2018
Five typical graph matching methods are chosen to value their performance and scalability in social networks and it is found that VF2 and RI is applicable for rather small graphs while SPath performs better in large graph when average degree of graph is small.
Initialization and Coordinate Optimization for Multi-way Matching
• Mathematics, Computer Science
AISTATS
• 2017
This work proposes a coordinate update algorithm that directly optimizes the target objective by using pairwise alignment information to build an undirected graph and initializing the permutation matrices along the edges of its Maximum Spanning Tree, which successfully avoids bad local optima.
Structural Constraints for Multipartite Entity Resolution with Markov Logic Network
• Computer Science
CIKM
• 2015
This work proposes a principled solution to the multipartite entity resolution problem, building on the foundation of Markov Logic Network (MLN) that combines probabilistic graphical model and first-order logic.
Crowdsourced Collective Entity Resolution with Relational Match Propagation
• Computer Science
2020 IEEE 36th International Conference on Data Engineering (ICDE)
• 2020
This paper proposes a novel approach called crowdsourced collective ER, which iteratively asks human workers to label picked entity pairs and propagates the labeling information to their neighbors in distance and achieves superior accuracy with much less labeling.
• Computer Science
DATA
• 2017
A study on how to calculate similarity score not only based on string similarity techniques or topological graph similarity, but also using graph interactions between nodes to effectively achieve better linkage results is proposed.
Neighborhood-Aware Attentional Representation for Multilingual Knowledge Graphs
• Computer Science
IJCAI
• 2019
This paper incorporates neighborhood subgraph-level information of entities, and proposes a neighborhood-aware attentional representation method NAEA for multilingual knowledge graphs that significantly and consistently outperforms state-of-the-art entity alignment models.
COTSAE: CO-Training of Structure and Attribute Embeddings for Entity Alignment
• Computer Science
AAAI
• 2020
This work proposes COTSAE that combines the structure and attribute information of entities by co-training two embedding learning components, respectively, and proposes a joint attention method in the model to learn the attentions of attribute types and values cooperatively.

## References

SHOWING 1-10 OF 44 REFERENCES
Improving Entity Resolution with Global Constraints
• Computer Science
ArXiv
• 2011
This paper investigates another socio-economic property that has not yet been exploited: sites that create lists of entities, such as IMDB and Netix, have an incentive to avoid gratuitous duplicates, and finds that this property is leveraged to resolve entities across the dierent web sites, and that it can obtain substantial improvements in resolution accuracy.
Constraint-Based Entity Matching
A novel combination of EM and relaxation labeling algorithms that efficiently learns the model, thereby matching mentions in an unsupervised way, without the need for annotated training data, and that the solution scales up to large data sets.
Max-Product for Maximum Weight Matching: Convergence, Correctness, and LP Duality
• Mathematics, Computer Science
IEEE Transactions on Information Theory
• 2008
This paper proves the correctness and convergence of max-product for finding the maximum weight matching (MWM) in bipartite graphs and provides a bound on the number of iterations required and it is shown that for a graph of size n, the computational cost of the algorithm scales as O(n3), which is the same as the computationalcost of the best known algorithms forFinding the MWM.
Scaling multiple-source entity resolution using statistically efficient transfer learning
• Computer Science
CIKM
• 2012
This work addresses the prohibitive cost of labeling training data for supervised learning of similarity scores for each pair of sources with a brand new transfer learning algorithm which requires far less training data and achieves superior accuracy with the same data and is trained using fast convex optimization.
A Hierarchical Graphical Model for Record Linkage
• Computer Science, Mathematics
UAI
• 2004
This paper describes a hierarchical graphical model framework for the record-linkage problem in an unsupervised setting, and proposes new methods to minimize overfitting and describes a method for incorporating monotonicity constraints in a graphical model.
Exploiting context analysis for combining multiple entity resolution systems
• Computer Science
SIGMOD Conference
• 2009
A new ER Ensemble framework that employs two novel combining approaches, which are based on supervised learning, to combine the results of multiple base-level ER systems into a single solution with the goal of increasing the quality of ER.
Approximation algorithms for three-dimensional assignment problems with triangle inequalities
The three-dimensional assignment problem (3DA) is defined as follows. Given are three disjoint n-sets of points, and nonnegative costs associated with every triangle consisting of exactly one point
Record Matching over Query Results from Multiple Web Databases
• Computer Science
IEEE Transactions on Knowledge and Data Engineering
• 2010
An unsupervised, online record matching method, UDD, which can effectively identify duplicates from the query result records of multiple Web databases, and Experimental results show that UDD works well for the Web database scenario where existing supervised methods do not apply.
On the Complexity of Approximating k-Dimensional Matching
• Mathematics, Computer Science
RANDOM-APPROX
• 2003
It is proved that k-DM cannot be efficiently approximated to within a factor of O(\frac{k}{ \ln k}) unless P = NP, and NP-hardness factors of 4-DM, 5-DM and 6-DM are proved.
A Latent Dirichlet Model for Unsupervised Entity Resolution
• Computer Science
SDM
• 2006
This work proposes a novel sampling algorithm for collective entity resolution which is unsupervised and also takes entity relations into account, and demonstrates the utility and practicality of the relational entity resolution approach for author resolution in two real-world bibliographic datasets.