AutoBlock: A Hands-off Blocking Framework for Entity Matching

@article{Zhang2020AutoBlockAH,
  title={AutoBlock: A Hands-off Blocking Framework for Entity Matching},
  author={Wei Zhang and Hao Wei and Bunyamin Sisman and Xin Dong and Christos Faloutsos and David Page},
  journal={Proceedings of the 13th International Conference on Web Search and Data Mining},
  year={2020}
}
  • Wei Zhang, Hao Wei, +3 authors David Page
  • Published in WSDM '20 2020
  • Computer Science
  • Entity matching seeks to identify data records over one or multiple data sources that refer to the same real-world entity. Virtually every entity matching task on large datasets requires blocking, a step that reduces the number of record pairs to be matched. However, most of the traditional blocking methods are learning-free and key-based, and their successes are largely built on laborious human effort in cleaning data and designing blocking keys. In this paper, we propose AutoBlock, a novel… CONTINUE READING

    Create an AI-powered research feed to stay up to date with new papers like this posted to ArXiv

    References

    Publications referenced by this paper.
    SHOWING 1-7 OF 7 REFERENCES

    Optimal Data-Dependent Hashing for Approximate Near Neighbors

    VIEW 16 EXCERPTS
    HIGHLY INFLUENTIAL

    Similarity estimation techniques from rounding algorithms

    VIEW 16 EXCERPTS
    HIGHLY INFLUENTIAL

    A Blocking Framework for Entity Resolution in Highly Heterogeneous Information Spaces

    VIEW 4 EXCERPTS
    HIGHLY INFLUENTIAL

    A Fast Linkage Detection Scheme for Multi-Source Information Integration

    • Akiko Aizawa, Keizo Oyama
    • Computer Science
    • International Workshop on Challenges in Web Information Retrieval and Integration
    • 2005
    VIEW 4 EXCERPTS
    HIGHLY INFLUENTIAL

    Approximate String Joins in a Database (Almost) for Free

    VIEW 4 EXCERPTS
    HIGHLY INFLUENTIAL