• Publications
  • Influence
Running Experiments on Amazon Mechanical Turk
Although Mechanical Turk has recently become popular among social scientists as a source of experimental data, doubts may linger about the quality of data provided by participants recruited fromExpand
  • 3,399
  • 124
  • PDF
Duplicate Record Detection: A Survey
Often, in the real world, entities have two or more representations in databases. Duplicate records do not share a common key and/or they contain errors that make duplicate matching a difficult task.Expand
  • 1,547
  • 122
  • PDF
Demographics of Mechanical Turk
We present the results of a survey that collected information about the demographics of participants on Amazon Mechanical Turk, together with information about their level of activity and motivationExpand
  • 559
  • 93
Analyzing the Amazon Mechanical Turk Marketplace
Since the concept of crowd sourcing is relatively new, many potential participants have questions about the AMT marketplace. For example, a common set of questions that pop up in an 'introduction toExpand
  • 786
  • 92
  • PDF
Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics
With the rapid growth of the Internet, the ability of users to create and publish content has created active electronic communities that provide a wealth of product information. However, the highExpand
  • 891
  • 77
  • PDF
Quality management on Amazon Mechanical Turk
Crowdsourcing services, such as Amazon Mechanical Turk, allow for easy distribution of small tasks to a large number of workers. Unfortunately, since manually verifying the quality of the submittedExpand
  • 900
  • 67
Get another label? improving data quality and data mining using multiple, noisy labelers
This paper addresses the repeated acquisition of labels for data items when the labeling is imperfect. We examine the improvement (or lack thereof) in data quality via repeated labeling, and focusExpand
  • 988
  • 64
  • PDF
Approximate String Joins in a Database (Almost) for Free
String data is ubiquitous, and its management has taken on particular importance in the past few years. Approximate queries are very important on string data especially for more complex queriesExpand
  • 605
  • 64
  • PDF
Deriving the Pricing Power of Product Features by Mining Consumer Reviews
Increasingly, user-generated product reviews serve as a valuable source of information for customers making product choices online. The existing literature typically incorporates the impact ofExpand
  • 509
  • 40
  • PDF
Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection
We introduce tools and methodologies to collect high quality, large scale fine-grained computer vision datasets using citizen scientists - crowd annotators who are passionate and knowledgeable aboutExpand
  • 162
  • 30
  • PDF