Deep entity matching with pre-trained language models

@article{Li2020DeepEM,
  title={Deep entity matching with pre-trained language models},
  author={Yuliang Li and Jinfeng Li and Yoshihiko Suhara and A. Doan and W. Tan},
  journal={Proceedings of the VLDB Endowment},
  year={2020},
  volume={14},
  pages={50 - 60}
}
We present Ditto, a novel entity matching system based on pre-trained Transformer-based language models. We fine-tune and cast EM as a sequence-pair classification problem to leverage such models with a simple architecture. Our experiments show that a straight-forward application of language models such as BERT, DistilBERT, or RoBERTa pre-trained on large text corpora already significantly improves the matching quality and outperforms previous state-of-the-art (SOTA), by up to 29% of F1 score… Expand
21 Citations
Deep Entity Matching
  • Highly Influenced
Deep Indexed Active Learning for Matching Heterogeneous Entity Representations
  • Highly Influenced
  • PDF
Automated Machine Learning for Entity Matching Tasks
  • Highly Influenced
  • PDF
Intermediate Training of BERT for Product Matching
  • 3
  • PDF
TURL: Table Understanding through Representation Learning
  • 6
  • PDF
Neural Networks for Entity Matching
  • 2
  • Highly Influenced
  • PDF
Automating Entity Matching Model Development
  • PDF
Using schema.org Annotations for Training and Maintaining Product Matchers
  • 2
  • PDF
From Natural Language Processing to Neural Databases
  • PDF
Neural Databases
  • 2
  • PDF
...
1
2
3
...

References

SHOWING 1-10 OF 12 REFERENCES
Low-resource Deep Entity Resolution with Transfer and Active Learning
  • 41
  • Highly Influential
  • PDF
Deep Learning for Entity Matching: A Design Space Exploration
  • 168
  • Highly Influential
  • PDF
Distributed Representations of Tuples for Entity Resolution
  • 73
  • Highly Influential
  • PDF
The WDC Training Dataset and Gold Standard for Large-Scale Product Matching
  • 9
  • Highly Influential
  • PDF
Deep entity matching with pre-trained language models
  • 1
  • Highly Influential
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
  • 1,126
  • Highly Influential
  • PDF
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
  • 429
  • Highly Influential
  • PDF
Unsupervised Data Augmentation
  • 143
  • Highly Influential
Entity matching with transformer architectures - a step forward in data integration
  • 2
  • Highly Influential
DistilBERT
  • a distilled version of BERT: smaller, faster, cheaper and lighter. In Proc. EMC2 ’19
  • 2019
...
1
2
...