Using the Amazon Mechanical Turk for transcription of spoken language

  title={Using the Amazon Mechanical Turk for transcription of spoken language},
  author={Matthew Marge and Satanjeev Banerjee and Alexander I. Rudnicky},
  journal={2010 IEEE International Conference on Acoustics, Speech and Signal Processing},
We investigate whether Amazon's Mechanical Turk (MTurk) service can be used as a reliable method for transcription of spoken language data. [...] Key Method Transcriptions were compared against transcriptions carefully prepared in-house through conventional (manual) means. We found that transcriptions from MTurk workers were generally quite accurate. Further, when transcripts for the same utterance produced by multiple workers were combined using the ROVER voting scheme, the accuracy of the combined transcript…Expand
Using Amazon Mechanical Turk for Transcription of Non-Native Speech
The results show that the merged MTurk transcriptions are as accurate as an individual expert transcriber for the read-aloud responses, and are only slightly less accurate for the spontaneous responses.
Using the Amazon Mechanical Turk to Transcribe and Annotate Meeting Speech for Extractive Summarization
It is found that MTurk can be used to produce high-quality transcription and two techniques for doing so are described, including voting and corrective, which are comparable to that obtained using trained personnel.
Crowdsourcing Transcription Beyond Mechanical Turk
This work presents a qualitative and quantitative analysis of eight crowdsourcing service providers for transcription and assess tradeoffs among the quality, cost, risk and effort of alternative crowd-based transcription options.
Evaluation of crowdsourcing transcriptions for African languages
We evaluate the quality of speech transcriptions acquired by crowdsourcing to develop ASR acoustic models (AM) for under-resourced languages. We have developed AMs using reference (REF)
Crowdsourcing Transcription Beyond Mechanical Turk Haofeng Zhou
While much work has studied crowdsourced transcription via Amazon’s Mechanical Turk, we are not familiar with any prior cross-platform analysis of crowdsourcing service providers for transcription.
Quality Assessment of Crowdsourcing Transcriptions for African Languages
It is concluded that it is possible to acquire quality transcriptions from the crowd for under-resourced languages using Amazon’s Mechanical Turk and some legal and ethical issues to consider.
Language coverage for mismatched crowdsourcing
Phonological properties of different languages in a coding-theoretic framework are discussed, and how nonnative phoneme misperception can be modeled as a noisy communication channel is discussed.
Crowd-sourcing for difficult transcription of speech
Three new methods of crowd-sourcing are developed, which allow explicit trade-offs among precision, recall, and cost, and the effects of various task design factors on transcription latency and accuracy are studied.
On Employing a Highly Mismatched Crowd for Speech Transcription
This paper considers transcription of spoken Russian words by a rural Indian crowd that is unfamiliar with Russian and has very limited knowledge of English and shows that if the script constraint is removed, then countries like India can provide significantly larger crowd base.
A collective data generation method for speech language models
The results show that AMT text queries are effective for initial language model training for spoken dialogue systems, and that crowd-sourced speech collection within the context of a spoken dialogue framework provides significant improvement.


A self-labeling speech corpus: collecting spoken words with an online educational game
It is found that one third of the speech collected with Voice Race could be automatically transcribed with over 98% accuracy; and that an additional 49% could be labeled cheaply by Amazon Mechanical Turk workers.
Fast, Cheap, and Creative: Evaluating Translation Quality Using Amazon’s Mechanical Turk
It is found that when combined non-expert judgments have a high-level of agreement with the existing gold-standard judgments of machine translation quality, and correlate more strongly with expert judgments than Bleu does, Mechanical Turk can be used to calculate human-mediated translation edit rate (HTER), to conduct reading comprehension experiments with machine translation, and to create high quality reference translations.
A self-transcribing speech corpus: collecting continuous speech with an online educational game
It is shown that Amazon Mechanical Turk can be used to orthographically transcribe utterances in the corpus quickly and cheaply, with near-expert accuracy, and the usefulness of such self-transcribed data for acoustic model adaptation is demonstrated.
Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks
This work explores the use of Amazon's Mechanical Turk system, a significantly cheaper and faster method for collecting annotations from a broad base of paid non-expert contributors over the Web, and proposes a technique for bias correction that significantly improves annotation quality on two tasks.
What did they actually say? agreement and disagreement among transcribers of non-native spontaneous speech responses in an English proficiency test
This paper presents an analysis of differences in human transcriptions of non-native spontaneous speech on a word level, collected in the context of an English Proficiency Test, and finds substantially higher disagreement rates between transcribers ofnon-native speech.
The carnegie mellon communicator corpus
A portion of this corpus, covering the years 1999-2001, is being published for research purposes, and a number of procedures for managing the transcription process and for ensuring accuracy are described.
Crowdsourcing user studies with Mechanical Turk
Although micro-task markets have great potential for rapidly collecting user measurements at low costs, it is found that special care is needed in formulating tasks in order to harness the capabilities of the approach.
A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER)
  • J. Fiscus
  • Computer Science
    1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings
  • 1997
A post-recognition process which models the output generated by multiple ASR systems as independent knowledge sources that can be combined and used to generate an output with reduced error rate.
The NIST Rich Transcription Evaluation Project
  • The NIST Rich Transcription Evaluation Project