Efficient Harvesting of Internet Audio for Resource-Scarce ASR

  title={Efficient Harvesting of Internet Audio for Resource-Scarce ASR},
  author={Marelie H. Davel and Charl Johannes van Heerden and Neil Kleynhans and Etienne Barnard},
Spoken recordings that have been transcribed for human reading (e.g. as captions for audiovisual material, or to provide alternative modes of access to recordings) are widely available in many languages. Such recordings and transcriptions have proven to be a valuable source of ASR data in well-resourced languages, but have not been exploited to a significant extent in under-resourced languages or dialects. Techniques used to harvest such data typically assume the availability of a fairly… CONTINUE READING
Highly Cited
This paper has 23 citations. REVIEW CITATIONS