Corpus ID: 114895158

Crowdsourcing for Speech: Economic, Legal and Ethical analysis

@inproceedings{Adda2014CrowdsourcingFS,
  title={Crowdsourcing for Speech: Economic, Legal and Ethical analysis},
  author={Gilles Adda and J. Mariani and Laurent Besacier and Hadrien Gelas},
  year={2014}
}
With respect to spoken language resource production, Crowdsourcing - the process of distributing tasks to an open, unspecified population via the internet - offers a wide range of opportunities: populations with specific skills are potentially instantaneously accessible somewhere on the globe for any spoken language. As is the case for most newly introduced high-tech services, crowdsourcing raises both hopes and doubts, certainties and questions. A general analysis of Crowdsourcing for Speech… Expand

Tables from this paper

Taking a HIT: Designing around Rejection, Mistrust, Risk, and Workers' Experiences in Amazon Mechanical Turk
TLDR
It is argued that making reducing risk and building trust a first-class design goal can lead to solutions that improve outcomes around rejected work for all parties in online labor markets. Expand
Crowdsourcing formulaic phrases: towards a new type of spoken corpus
Spoken corpora have traditionally been assembled through careful recording and transcription of discourse events, a process which is both labour intensive and often restrictive in terms of breadth ...
Ethical Challenges in the Future of Work
TLDR
This paper presents the vision on how ethical issues related to fairness, transparency, and bias regarding these new forms of work can be combated in the future of work, and how this will impact the data management research community and future work platforms. Expand
Exploiting resources from closely-related languages for automatic speech recognition in low-resource languages from Malaysia. (Utilisation de ressources dans une langue proche pour la reconnaissance automatique de la parole pour les langues peu dotées de Malaisie)
TLDR
The effects of using data from closely-related languages to build ASR for low-resource languages in Malaysia, including Iban, an under-resourced language spoken in Borneo island are studied. Expand

References

SHOWING 1-10 OF 62 REFERENCES
Crowdsourcing for Language Resource Development: Critical Analysis of Amazon Mechanical Turk Overpowering Use
TLDR
This article is a position paper about crowdsourced microworking systems and especially Amazon Mechanical Turk, the use of which has been steadily growing in language processing in the past few years, and proposes practical and organizational solutions to improve new language resources development. Expand
Look before you leap: Legal pitfalls of crowdsourcing
TLDR
This paper considers five legal issues that the crowdsourcing community (providers and customers) should discuss, both to inform their own practice and to advise future policy. Expand
Building a Persistent Workforce on Mechanical Turk for Multilingual Data Collection
Traditional methods of collecting translation and paraphrase data are prohibitively expensive, making the construction of large, new corpora difficult. While crowdsourcing offers a cheap alternative,Expand
Working the Crowd: Employment and Labor Law in the Crowdsourcing Industry
This Article confronts the thorny questions that arise in attempting to apply traditional employment and labor law to “crowdsourcing,” an emerging online labor model unlike any that has existed toExpand
Who are the crowdworkers?: shifting demographics in mechanical turk
TLDR
How the worker population has changed over time is described, shifting from a primarily moderate-income, U.S. based workforce towards an increasingly international group with a significant population of young, well-educated Indian workers. Expand
CrowdForge: crowdsourcing complex work
TLDR
This work presents a general purpose framework for accomplishing complex and interdependent tasks using micro-task markets, a web-based prototype, and case studies on article writing, decision making, and science journalism that demonstrate the benefits and limitations of the approach. Expand
The Need for Standardization in Crowdsourcing
Crowdsourcing has shown itself to be well-suited for the accomplishment of certain kinds of small tasks, yet many crowdsourceable tasks still require extensive structuring and managerial effortExpand
Toward better crowdsourced transcription: Transcription of a year of the Let's Go Bus Information System data
TLDR
This paper presents a two-stage approach for the use of MTurk to transcribe one year of Let's Go Bus Information System data, corresponding to 156.74 hours (257,658 short utterances). Expand
Who are the Turkers? Worker Demographics in Amazon Mechanical Turk
Amazon Mechanical Turk (MTurk) is a crowdsourcing system in which tasks are distributed to a population of thousands of anonymous workers for completion. This system is becoming increasingly popularExpand
Using the Amazon Mechanical Turk for transcription of spoken language
TLDR
It was found that transcriptions from MTurk workers were generally quite accurate, and when transcripts for the same utterance produced by multiple workers were combined using the ROVER voting scheme, the accuracy of the combined transcript rivaled that observed for conventional transcription methods. Expand
...
1
2
3
4
5
...