• Publications
  • Influence
CrowdER: Crowdsourcing Entity Resolution
Entity resolution is central to data integration and data cleaning. Algorithmic approaches have been improving in quality, but remain far from perfect. Crowdsourcing platforms offer a more accurateExpand
  • 469
  • 35
  • PDF
Can we beat the prefix filtering?: an adaptive framework for similarity join and search
As two important operations in data cleaning, similarity join and similarity search have attracted much attention recently. Existing methods to support similarity join usually adopt aExpand
  • 177
  • 28
  • PDF
Leveraging transitive relations for crowdsourced joins
The development of crowdsourced query processing systems has recently attracted a significant attention in the database community. A variety of crowdsourced queries have been investigated. In thisExpand
  • 190
  • 24
  • PDF
PASS-JOIN: A Partition-based Method for Similarity Joins
As an essential operation in data cleaning, the similarity join has attracted considerable attention from the database community. In this paper, we study string similarity joins with edit-distanceExpand
  • 166
  • 20
  • PDF
MassJoin: A mapreduce-based method for scalable string similarity joins
String similarity join is an essential operation in data integration. The era of big data calls for scalable algorithms to support large-scale string similarity joins. In this paper, we studyExpand
  • 101
  • 12
  • PDF
A string similarity join finds similar pairs between two collections of strings. It is an essential operation in many applications, such as data integration and cleaning, and has attractedExpand
  • 124
  • 12
  • PDF
Fast-join: An efficient method for fuzzy token matching based string similarity join
String similarity join that finds similar string pairs between two string sets is an essential operation in many applications, and has attracted significant attention recently in the databaseExpand
  • 118
  • 11
  • PDF
Entity Matching: How Similar Is Similar
Entity matching that finds records referring to the same entity is an important operation in data cleaning and integration. Existing studies usually use a given similarity function to quantify theExpand
  • 85
  • 11
  • PDF
QASCA: A Quality-Aware Task Assignment System for Crowdsourcing Applications
A crowdsourcing system, such as the Amazon Mechanical Turk (AMT), provides a platform for a large number of questions to be answered by Internet workers. Such systems have been shown to be useful toExpand
  • 140
  • 8
  • PDF
Trie-join: a trie-based method for efficient string similarity joins
A string similarity join finds similar pairs between two collections of strings. Many applications, e.g., data integration and cleaning, can significantly benefit from an efficientExpand
  • 61
  • 5
  • PDF