• Publications
  • Influence
Effective Crowd Annotation for Relation Extraction
TLDR
This paper demonstrates that a much larger boost is possible in crowdsourced annotation of training data boost performance for relation extraction over methods based solely on distant supervision, thanks to a simple, generalizable technique, Gated Instruction.
Crowdsourcing Multi-Label Classification for Taxonomy Creation
TLDR
This paper presents DELUGE, an improved workflow that produces taxonomies with comparable quality using significantly less crowd labor and optimizes CASCADE’s most costly step (categorization) using less than 10% of the labor required by the original approach.
MicroTalk: Using Argumentation to Improve Crowdsourcing Accuracy
TLDR
This paper presents a new quality-control workflow that requires some workers to Justify their reasoning and asks others to Reconsider their decisions after reading counter-arguments from workers with opposing views, which produces much higher accuracy than simpler voting approaches for a range of budgets.
Optimal Testing for Crowd Workers
TLDR
Evaluations on both synthetic data and with real Mechanical Turk workers show that the agent learns adaptive testing policies that produce up to 111% more reward than the non-adaptive policies used by most requesters.
GENIE: A Leaderboard for Human-in-the-Loop Evaluation of Text Generation
TLDR
This work introduces GENIE, an extensible human evaluation leaderboard, which brings the ease of leaderboards to text generation tasks and provides formal granular evaluation metrics and identifies areas for future research.
Parallel Task Routing for Crowdsourcing
TLDR
It is proved that even the simplest task routing problem is NP-hard, and an intuitive class of requesters' utility functions is submodular, which lets us provide iterative methods for dynamically allocating batches of tasks that make near-optimal use of available workers in each round.
FLEX: Unifying Evaluation for Few-Shot NLP
TLDR
The FLEX Principles are formulated, a set of requirements and best practices for unified, rigorous, valid, and cost-sensitive few-shot NLP evaluation that include Sample Size Design, a novel approach to benchmark design that optimizes statistical accuracy and precision while keeping evaluation costs manageable.
Subcontracting Microwork
TLDR
It is argued that crowdwork platforms can improve their value proposition for all stakeholders by supporting subcontracting within microtasks by defining three models for microtask subcontracting: real-time assistance, task management, and task improvement.
Artificial Intelligence and Collective Intelligence
TLDR
This chapter explores the immense value of AI techniques for collective intelligence, including ways to make interactions between large numbers of humans more efficient, and the use of machine intelligence for the management of crowdsourcing platforms.
Cicero: Multi-Turn, Contextual Argumentation for Accurate Crowdsourcing
TLDR
Cicero is presented, a new workflow that improves crowd accuracy on difficult tasks by engaging workers in multi-turn, contextual discussions through real-time, synchronous argumentation.
...
1
2
3
...