• Corpus ID: 114895158

Crowdsourcing for Speech: Economic, Legal and Ethical analysis

  title={Crowdsourcing for Speech: Economic, Legal and Ethical analysis},
  author={Gilles Adda and J. Mariani and Laurent Besacier and Hadrien Gelas},
With respect to spoken language resource production, Crowdsourcing - the process of distributing tasks to an open, unspecified population via the internet - offers a wide range of opportunities: populations with specific skills are potentially instantaneously accessible somewhere on the globe for any spoken language. As is the case for most newly introduced high-tech services, crowdsourcing raises both hopes and doubts, certainties and questions. A general analysis of Crowdsourcing for Speech… 

Tables from this paper

Taking a HIT: Designing around Rejection, Mistrust, Risk, and Workers' Experiences in Amazon Mechanical Turk
It is argued that making reducing risk and building trust a first-class design goal can lead to solutions that improve outcomes around rejected work for all parties in online labor markets.
Crowdsourcing formulaic phrases: towards a new type of spoken corpus
Spoken corpora have traditionally been assembled through careful recording and transcription of discourse events, a process which is both labour intensive and often restrictive in terms of breadth ...
Ethical Challenges in the Future of Work
This paper presents the vision on how ethical issues related to fairness, transparency, and bias regarding these new forms of work can be combated in the future of work, and how this will impact the data management research community and future work platforms.
Exploiting resources from closely-related languages for automatic speech recognition in low-resource languages from Malaysia. (Utilisation de ressources dans une langue proche pour la reconnaissance automatique de la parole pour les langues peu dotées de Malaisie)
The effects of using data from closely-related languages to build ASR for low-resource languages in Malaysia, including Iban, an under-resourced language spoken in Borneo island are studied.


Crowdsourcing for Language Resource Development: Critical Analysis of Amazon Mechanical Turk Overpowering Use
This article is a position paper about crowdsourced microworking systems and especially Amazon Mechanical Turk, the use of which has been steadily growing in language processing in the past few years, and proposes practical and organizational solutions to improve new language resources development.
Look before you leap: Legal pitfalls of crowdsourcing
This paper considers five legal issues that the crowdsourcing community (providers and customers) should discuss, both to inform their own practice and to advise future policy.
Building a Persistent Workforce on Mechanical Turk for Multilingual Data Collection
This work presents the results of one of the largest linguistic data collection efforts to date using Mechanical Turk, yielding 85K English sentences and more than 1k sentences for each of a dozen more languages.
Working the Crowd: Employment and Labor Law in the Crowdsourcing Industry
This Article confronts the thorny questions that arise in attempting to apply traditional employment and labor law to “crowdsourcing,” an emerging online labor model unlike any that has existed to
Who are the crowdworkers?: shifting demographics in mechanical turk
How the worker population has changed over time is described, shifting from a primarily moderate-income, U.S. based workforce towards an increasingly international group with a significant population of young, well-educated Indian workers.
CrowdForge: crowdsourcing complex work
This work presents a general purpose framework for accomplishing complex and interdependent tasks using micro-task markets, a web-based prototype, and case studies on article writing, decision making, and science journalism that demonstrate the benefits and limitations of the approach.
The Need for Standardization in Crowdsourcing
This work argues that standardization of basic “building block” tasks would make crowdsourcing more scalable and increase the demand for paid crowdsourcing, a development it argues is positive on both efficiency and welfare grounds.
Toward better crowdsourced transcription: Transcription of a year of the Let's Go Bus Information System data
This paper presents a two-stage approach for the use of MTurk to transcribe one year of Let's Go Bus Information System data, corresponding to 156.74 hours (257,658 short utterances).
Who are the Turkers? Worker Demographics in Amazon Mechanical Turk
Amazon Mechanical Turk (MTurk) is a crowdsourcing system in which tasks are distributed to a population of thousands of anonymous workers for completion. This system is becoming increasingly popular
Using the Amazon Mechanical Turk for transcription of spoken language
It was found that transcriptions from MTurk workers were generally quite accurate, and when transcripts for the same utterance produced by multiple workers were combined using the ROVER voting scheme, the accuracy of the combined transcript rivaled that observed for conventional transcription methods.