Cheap and Fast - But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks


Human linguistic annotation is crucial for many natural language processing tasks but can be expensive and time-consuming. We explore the use of Amazon's Mechanical Turk system, a significantly cheaper and faster method for collecting annotations from a broad base of paid non-expert contributors over the Web. We investigate five tasks: affect recognition, word similarity, recognizing textual entailment, event temporal ordering, and word sense disambiguation. For all five, we show high agreement between Mechanical Turk non-expert annotations and existing gold standard labels provided by expert label-ers. For the task of affect recognition, we also show that using non-expert labels for training machine learning algorithms can be as effective as using gold standard annotations from experts. We propose a technique for bias correction that significantly improves annotation quality on two tasks. We conclude that many large labeling tasks can be effectively designed and carried out in this method at a fraction of the usual expense.

Extracted Key Phrases

11 Figures and Tables

Showing 1-10 of 32 references

Contextual Correlates of Semantic Similarity

  • George A Miller, William G Charles
  • 1991
Highly Influential
4 Excerpts


  • James Pustejovsky, Patrick Hanks, +8 authors Marcia Lazo
  • 2003
Highly Influential
4 Excerpts

Automatic Extraction of Useful Facet Terms from Text Documents

  • Wisam Dakka, Panagiotis G Ipeirotis
  • 2008
1 Excerpt

Evidence for Varying Search Results Summary Lengths

  • Michael Kaisser, Marti Hearst, John B Lowe
  • 2008
1 Excerpt
Showing 1-10 of 964 extracted citations
Citations per Year

1,616 Citations

Semantic Scholar estimates that this publication has received between 1,452 and 1,801 citations based on the available data.

See our FAQ for additional information.