Machop: an end-to-end generalized entity matching framework

  title={Machop: an end-to-end generalized entity matching framework},
  author={Jin Wang and Yuliang Li and Wataru Hirota and Eser Kandogan},
  journal={Proceedings of the Fifth International Workshop on Exploiting Artificial Intelligence Techniques for Data Management},
  • Jin WangYuliang Li E. Kandogan
  • Published 10 June 2022
  • Computer Science
  • Proceedings of the Fifth International Workshop on Exploiting Artificial Intelligence Techniques for Data Management
Real-world applications frequently seek to solve a general form of the Entity Matching (EM) problem to find associated entities. Such scenarios include matching jobs to candidates in job targeting, matching students with courses in online education, matching products with user reviews on e-commercial websites, and beyond. These tasks impose new requirements such as matching data entries with diverse formats or having a flexible and semantics-rich matching definition, which are beyond the… 

Figures and Tables from this paper



Machamp: A Generalized Entity Matching Benchmark

This paper comes up with a new research problem - Generalized Entity Matching to satisfy this requirement and creates a benchmark Machamp, which is the first time that researchers can evaluate EM techniques between data collections with different structures.

Entity Matching with Transformer Architectures - A Step Forward in Data Integration

This paper empirically compares the capability of transformer architectures and transfer-learning on the task of EM and shows that transformer architectures outperform classical deep learning methods in EM by an average margin of 27.5%.

Deep Learning for Entity Matching: A Design Space Exploration

The results show that DL does not outperform current solutions on structured EM, but it can significantly outperform them on textual and dirty EM, which suggests that practitioners should seriously consider using DL for textual anddirty EM problems.

Deep Sequence-to-Sequence Entity Matching for Heterogeneous Entity Resolution

This paper proposes an align-compare-aggregate neural network for Seq2Seq entity matching, which can learn the representations of tokens, capture the semantic relevance between tokens, and aggregate matching evidence for accurate ER decisions in an end-to-end manner.

Deep entity matching with pre-trained language models

The fine-tune and cast EM as a sequence-pair classification problem to leverage Transformer-based language models with a simple architecture and establish that Ditto can achieve the previous SOTA results with at most half the number of labeled data.

End-to-End Multi-Perspective Matching for Entity Resolution

An end-to-end multi-perspective entity matching model which can adaptively select optimal similarity measures for heterogenous attributes by jointly learning and selecting similarity measures in an end- to-end way is proposed.

ZeroER: Entity Resolution using Zero Labeled Examples

This paper investigates an important problem that vexes practitioners: is it possible to design an effective algorithm for ER that requires Zero labeled examples, yet can achieve performance comparable to supervised approaches, and presents a proposed approach dubbed ZeroER.

How to Get Them a Dream Job?: Entity-Aware Features for Personalized Job Search Ranking

This paper proposes an approach to applying standardized entity data to improve job search quality and to make search results more personalized, and proposes a concept of entity-faceted historical click-through-rates to capture job document quality.

Hierarchical Matching Network for Heterogeneous Entity Resolution

This paper proposes an end-to-end hierarchical matching network (HierMatcher) for entity resolution, which can jointly match entities in three levels—token, attribute, and entity.

Magellan: Toward Building Entity Matching Management Systems over Data Science Stacks

This paper discusses the limitations of current EM systems, presents Magellan, a new kind of EM systems that addresses these limitations, and proposes demonstration scenarios that show the promise of the Magellan approach.