Confusing the Crowd: Task Instruction Quality on Amazon Mechanical Turk

@inproceedings{Wu2017ConfusingTC,
  title={Confusing the Crowd: Task Instruction Quality on Amazon Mechanical Turk},
  author={Meng-Han Wu and Alexander J. Quinn},
  booktitle={HCOMP},
  year={2017}
}
Task instruction quality is widely presumed to affect outcomes, such as accuracy, throughput, trust, and worker satisfaction. Best practices guides written by experienced requesters share their advice about how to craft task interfaces. However, there is little evidence of how specific task design attributes affect actual outcomes. This paper presents a set of studies that expose the relationship between three sets of measures: (a) workers’ perceptions of task quality, (b) adherence to… 

Figures and Tables from this paper

TaskMate: A Mechanism to Improve the Quality of Instructions in Crowdsourcing

TaskMate is presented, a system for facilitating worker-led refinement of task instructions with minimal involvement by the requester and small teams of workers search for ambiguities and vote on the interpretation they believe the requesters intended.

WingIt: Efficient Refinement of Unclear Task Instructions

The system, WingIt, implements a set of methods that enable workers to cope with unclear or ambiguous instructions and produce high quality results with minimal reliance on the requester.

A Checklist to Combat Cognitive Biases in Crowdsourcing

It is shown that cognitive biases may often affect crowd workers but are typically not considered as potential sources of poor data quality, and a 12-item checklist adapted from business psychology to combat cognitive biases in crowdsourcing is proposed.

Sprout: Crowd-Powered Task Design for Crowdsourcing

A novel meta-workflow is proposed that helps requesters optimize crowdsourcing task designs and Sprout, the open-source tool, which implements this workflow, improves task designs by eliciting points of confusion from crowd workers, enabling requesters to quickly understand these misconceptions and the overall space of questions.

The Expertise Involved in Deciding which HITs are Worth Doing on Amazon Mechanical Turk

It is argued that differences between the two populations likely lead to the wage imbalances and pointed to several future directions, including machine learning models that support workers in detecting poor quality labor, and paths for educating novice workers on how to make better labor decisions on AMT.

On the State of Reporting in Crowdsourcing Experiments and a Checklist to Aid Current Practices

This paper starts by identifying sensible implementation choices, relying on existing literature and interviews with experts, to then extensively analyze the reporting of 171 crowdsourcing experiments, and proposes a checklist for reporting crowdsourced experiments.

Annotator Rationales for Labeling Tasks in Crowdsourcing

This work investigates asking judges to provide a specific form of rationale supporting each rating decision and suggests a win-win approach on an information retrieval task in which human judges rate the relevance of Web pages for different search topics.

What Is Unclear? Computational Assessment of Task Clarity in Crowdsourcing

It is found that the content, style, and readability of tasks descriptions are particularly important in shaping their clarity, and the design of tools to help requesters in improving task clarity on crowdsourcing platforms has important implications.

Daily Turking: Designing Longitudinal Daily-task Studies on Mechanical Turk

It is found that using the Mechanical Turk platform for conducting longitudinal daily task studies is a viable method to augment or replace traditional lab studies and investigates the specific concern of reconciling informed consent with workers’ desire to complete tasks quickly.

Predicting the Working Time of Microtasks Based on Workers' Perception of Prediction Errors

A computational technique for predicting microtask working times based on past experiences of workers regarding similar tasks and challenges encountered in defining evaluation and/or objective functions have been described based on the tolerance demonstrated by workers with regard to prediction errors.

References

SHOWING 1-10 OF 34 REFERENCES

Exploring task properties in crowdsourcing - an empirical study on mechanical turk

A series of explorative studies on task properties on MTurk implies that other factors than demographics influence workers’ task selection and contributes to a better understanding of task choice.

Curiosity Killed the Cat, but Makes Crowdwork Better

The potential for curiosity as a new type of intrinsic motivational driver to incentivize crowd workers is examined and design crowdsourcing task interfaces that explicitly incorporate mechanisms to induce curiosity and conduct a set of experiments on Amazon's Mechanical Turk.

Taking a HIT: Designing around Rejection, Mistrust, Risk, and Workers' Experiences in Amazon Mechanical Turk

It is argued that making reducing risk and building trust a first-class design goal can lead to solutions that improve outcomes around rejected work for all parties in online labor markets.

Toward automatic task design: a progress report

This paper considers a common problem that requesters face on Amazon Mechanical Turk: how should a task be designed so as to induce good output from workers, and constructs models for predicting the rate and quality of work based on observations of output to various designs.

An Assessment of Intrinsic and Extrinsic Motivation on Task Performance in Crowdsourcing Markets

Results suggest that intrinsic motivation can indeed improve the quality of workers’ output, confirming the hypothesis and finding a synergistic interaction between intrinsic and extrinsic motivators that runs contrary to previous literature suggesting “crowding out” effects.

Fantasktic: Improving Quality of Results for Novice Crowdsourcing Users

It is shown that novice users can receive higher quality results when being supported by a guided task specification interface, compared with expert tasks which still perform comparably better.

Crowdsourcing Document Relevance Assessment with Mechanical Turk

While results are largely inconclusive, they identify important obstacles encountered, lessons learned, related work, and interesting ideas for future investigation.

Clarity is a Worthwhile Quality: On the Role of Task Clarity in Microtask Crowdsourcing

This paper surveyed 100 workers of the CrowdFlower platform to verify the presence of issues with task clarity in crowdsourcing marketplaces, reveal how crowd workers deal with such issues, and motivate the need for mechanisms that can predict and measure task clarity.

Reputation as a sufficient condition for data quality on Amazon Mechanical Turk

It is concluded that sampling high-reputation workers can ensure high-quality data without having to resort to using attention check questions (ACQs), which may lead to selection bias if participants who fail ACQs are excluded post-hoc.

Designing incentives for inexpert human raters

It is found that treatment conditions which asked workers to prospectively think about the responses of their peers - when combined with financial incentives - produced more accurate performance.