Crowdsourcing user studies with Mechanical Turk

  title={Crowdsourcing user studies with Mechanical Turk},
  author={Aniket Kittur and Ed H. Chi and Bongwon Suh},
User studies are important for many aspects of the design process and involve techniques ranging from informal surveys to rigorous laboratory studies. However, the costs involved in engaging users often requires practitioners to trade off between sample size, time requirements, and monetary costs. Micro-task markets, such as Amazon's Mechanical Turk, offer a potential paradigm for engaging a large number of users for low time and monetary costs. Here we investigate the utility of a micro-task… 

Figures and Tables from this paper

User interface design for crowdsourcing systems
This work studies the effects of different user interface designs on the performance of crowdsourcing systems and indicates that user interface design choices have a significant effect on crowdsourced worker performance.
Crowdsourcing performance evaluations of user interfaces
MTurk may be a productive setting for conducting performance evaluations of user interfaces providing a complementary approach to existing methodologies, and three previously well-studied user interface designs are evaluated.
Crowdsourcing, cognitive load, and user interface design
This work studies the effects of different user interface designs on performance and the latency of crowdsourcing systems to indicate that complex and poorly designed user interfaces contributed to lower worker performance and increased task latency.
Toward Crowdsourced User Studies for Software Evaluation
This work-in-progress paper describes a vision of fast and reliable software user experience studies conducted with the help from the crowd, to study which user study methods can instead be crowdsourced to generic audiences to enable the conduct of user studies without the need for expensive lab experiments.
Investigating the Efficacy of Crowdsourcing on Evaluating Visual Decision Supporting System
This study replicated a controlled lab study of decision-making tasks with different sorting techniques using crowdsourcing, and found potential sources of problems that can be improved to make the crowdsourcing experiment more viable.
CrowdStudy: general toolkit for crowdsourced evaluation of web interfaces
CrowdStudy is a general web toolkit that combines support for automated usability testing with crowdsourcing to facilitate large-scale online user testing and demonstrates several useful features of CrowdStudy for two different scenarios, and discusses the benefits and tradeoffs of using crowdsourced evaluation.
Exploring task properties in crowdsourcing - an empirical study on mechanical turk
A series of explorative studies on task properties on MTurk implies that other factors than demographics influence workers’ task selection and contributes to a better understanding of task choice.
Assessing Crowdsourcing Quality through Objective Tasks
One of the interesting findings is that the results do not confirm previous studies which concluded that an increase in payment attracts more noise, and the country of origin only has an impact in some of the categories and only in general text questions but there is no significant difference at the top pay.
Exploring Crowd Consistency in a Mechanical Turk Survey
  • Peng Sun, Kathryn T. Stolee
  • Computer Science
    2016 IEEE/ACM 3rd International Workshop on CrowdSourcing in Software Engineering (CSI-SE)
  • 2016
Characteristics of the worker population on Amazon's Mechanical Turk, a popular micro task crowdsourcing environment, are explored and the percentage of workers who are potentially qualified to perform software- or computer science- related tasks are measured.
Using Amazon's Mechanical Turk for User Studies: Eight Things You Need to Know
  • L. Layman, Gunnar Sigurdsson
  • Computer Science
    2013 ACM / IEEE International Symposium on Empirical Software Engineering and Measurement
  • 2013
Limits imposed using Amazon's Mechanical Turk during an experiment on cyber-attack investigation techniques are described and eight considerations for experimental design are provided so that other researchers can maximize the benefits of using the Mechanical Turk as a research platform.


What happened to remote usability testing?: an empirical study of three methods
Results from a systematic empirical comparison of three methods for remote usability testing and a conventional laboratory-based think-aloud method show that the remote synchronous method is virtually equivalent to the conventional method.
Testing web sites: five users is nowhere near enough
We observed the same task executed by 49 users on four production web sites. We tracked the rates of discovery of new usability problems on each site and, using that data, estimated the total number
Coase's Penguin, or Linux and the Nature of the Firm
It is suggested that the authors are seeing the broad and deep emergence of a new, third mode of production in the digitally networked environment, a mode I call commons-based peer production, which has systematic advantages over markets and managerial hierarchies when the object of production is information or culture.
Web credibility research: a method for online experiments and early study results
Through iterative design and testing, a procedure for conducting online experiments for conducting recent studies on Web credibility is developed and early results have implications for both HCI researchers and Web site designers.
The Hidden Order of Wikipedia
This case study is the Featured Article (FA) process, one of the best established procedures on Wikipedia, and it is demonstrated how this process blends elements of traditional workflow with peer production.
He says, she says: conflict and coordination in Wikipedia
The growth of non-direct work in Wikipedia is examined and the development of tools to characterize conflict and coordination costs in Wikipedia are described, which may inform the design of new collaborative knowledge systems. _criteria, accessed Sep
  • _criteria, accessed Sep
  • 2007
Featured Article Criteria
  • Featured Article Criteria
CHI 2008 Proceedings · Data Collection
  • CHI 2008 Proceedings · Data Collection
  • 2008