Utility data annotation with Amazon Mechanical Turk

@article{Sorokin2008UtilityDA,
  title={Utility data annotation with Amazon Mechanical Turk},
  author={Alexander Sorokin and David Alexander Forsyth},
  journal={2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops},
  year={2008},
  pages={1-8}
}
  • A. Sorokin, D. Forsyth
  • Published 23 June 2008
  • Biology
  • 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops
We show how to outsource data annotation to Amazon Mechanical Turk. Doing so has produced annotations in quite large numbers relatively cheaply. The quality is good, and can be checked and controlled. Annotations are produced quickly. We describe results for several different annotation problems. We describe some strategies for determining when the task is well specified and properly priced. 

Figures and Tables from this paper

Collecting Image Annotations Using Amazon’s Mechanical Turk
TLDR
It is found that the use of a qualification test provides the highest improvement of quality, whereas refining the annotations through follow-up tasks works rather poorly.
Efficiently Scaling up Crowdsourced Video Annotation A Set of Best Practices for High Quality, Economical Video Labeling
TLDR
It is argued that video annotation requires specialized skill; most workers are poor annotators, mandating robust quality control protocols and an inherent trade-off between the mix of human and cloud computing used vs. the accuracy and cost of the labeling.
Online crowdsourcing: Rating annotators and obtaining cost-effective labels
  • P. Welinder, P. Perona
  • Computer Science
    2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops
  • 2010
TLDR
A model of the labeling process which includes label uncertainty, as well a multi-dimensional measure of the annotators' ability is proposed, from which an online algorithm is derived that estimates the most likely value of the labels and the annotator abilities.
Data Quality from Crowdsourcing: A Study of Annotation Selection Criteria
TLDR
An empirical study is conducted to examine the effect of noisy annotations on the performance of sentiment classification models, and evaluate the utility of annotation selection on classification accuracy and efficiency.
How reliable are annotations via crowdsourcing: a study about inter-annotator agreement for multi-label image annotation
TLDR
The majority vote applied to generate one annotation set out of several opinions, is able to filter noisy judgments of non-experts to some extent and the resulting annotation set is of comparable quality to the annotations of experts.
A Lightweight Combinatorial Approach for Inferring the Ground Truth from Multiple Annotators
TLDR
This work presents a discriminative approach to infer the ground truth class labels by mapping both annotators and the tasks into a low-dimensional space, thereby providing more simplicity and computational efficiency than the state-of-the-art Bayesian methods.
Quality Assessment for Crowdsourced Object Annotations
TLDR
It is shown that one can significantly outperform simple baselines, such as that used by LabelMe, by combining multiple image-based annotation assessment strategies and is proposed and evaluated for automatically estimating the quality of a spatial object annotation.
Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks
TLDR
This work explores the use of Amazon's Mechanical Turk system, a significantly cheaper and faster method for collecting annotations from a broad base of paid non-expert contributors over the Web, and proposes a technique for bias correction that significantly improves annotation quality on two tasks.
Efficiently Scaling up Crowdsourced Video Annotation
TLDR
It is argued that video annotation requires specialized skill; most workers are poor annotators, mandating robust quality control protocols and an inherent trade-off between the mix of human and cloud computing used vs. the accuracy and cost of the labeling.
Reexaminatin on Voting for Crowd Sourcing MT Evaluation
TLDR
This model focuses on how to use poor quality crowdsourcing data to get high quality sorted data and achieves better results than voting model in all the cases in the authors' experiment, including sorting of two translations and four translations.
...
...

References

SHOWING 1-10 OF 29 REFERENCES
LabelMe: A Database and Web-Based Tool for Image Annotation
TLDR
A web-based tool that allows easy image annotation and instant sharing of such annotations is developed and a large dataset that spans many object categories, often containing multiple instances over a wide variety of images is collected.
Evaluation of Localized Semantics: Data, Methodology, and Experiments
TLDR
A new data set of 1014 images with manual segmentations and semantic labels for each segment is presented, together with a methodology for using this kind of data for recognition evaluation, and four algorithms which learn to label image regions from weakly labeled data are evaluated.
Labeling images with a computer game
TLDR
A new interactive system: a game that is fun and can be used to create valuable output that addresses the image-labeling problem and encourages people to do the work by taking advantage of their desire to be entertained.
Introduction to a Large-Scale General Purpose Ground Truth Database: Methodology, Annotation Tool and Benchmarks
TLDR
A large scale general purpose image database with human annotated ground truth consisting of more than 636,748 annotated images and video frames is presented.
Building a Large Annotated Corpus of English: The Penn Treebank
TLDR
As a result of this grant, the researchers have now published on CDROM a corpus of over 4 million words of running text annotated with part-of- speech (POS) tags, which includes a fully hand-parsed version of the classic Brown corpus.
Peekaboom: a game for locating objects in images
TLDR
Peekaboom is an entertaining web-based game that can help computers locate objects in images and is an example of a new, emerging class of games, which not only bring people together for leisure purposes, but also exist to improve artificial intelligence.
Caltech-256 Object Category Dataset
TLDR
A challenging set of 256 object categories containing a total of 30607 images is introduced and the clutter category is used to train an interest detector which rejects uninformative background regions.
Learning to parse images of articulated bodies
TLDR
This work considers the machine vision task of pose estimation from static images, specifically for the case of articulated objects, and casts visual inference as an iterative parsing process, where one sequentially learns better and better features tuned to a particular image.
A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics
TLDR
A database containing 'ground truth' segmentations produced by humans for images of a wide variety of natural scenes is presented and an error measure is defined which quantifies the consistency between segmentations of differing granularities.
The PASCAL visual object classes challenge 2006 (VOC2006) results
This report presents the results of the 2006 PASCAL Visual Object Classes Challenge (VOC2006). Details of the challenge, data, and evaluation are presented. Participants in the challenge submitted
...
...