• Corpus ID: 8414900

The Fisher Corpus: a Resource for the Next Generations of Speech-to-Text

@inproceedings{Cieri2004TheFC,
  title={The Fisher Corpus: a Resource for the Next Generations of Speech-to-Text},
  author={Christopher Cieri and David Miller and Kevin Walker},
  booktitle={International Conference on Language Resources and Evaluation},
  year={2004}
}
This paper describes, within the context of the DARPA EARS program, the design and implementation of the Fisher protocol for collecting conversational telephone speech which has yielded more than 16,000 English conversations. It also discusses the Quick Transcription specification that allowed 2000 hours of Fisher audio to be transcribed in less than one year. Fisher data is already in use within the DARPA EARS programs and will be published via the Linguistic Data Consortium for general use… 

Figures from this paper

Development of a speech-to-text transcription system for Finnish

This paper describes the development of a speech-to-text transcription system for the Finnish language, carried out without any detailed manual transcriptions, relying instead on several sources of audio and textual data found on the web.

IMS-Speech: A Speech to Text Tool

The IMS-Speech is a web based tool for German and English speech transcription aiming to facilitate research in various disciplines which require accesses to lexical information in spoken language materials and is freely available for academic researchers.

Techniques for rapid and robust topic identification of conversational telephone speech

A modified TF-IDF feature weighting calculation is presented that provides significant robustness under various recognition error conditions and observes classifiers incorporating confidence information to be significantly more robust to errors than those treating output as unweighted text.

Transcription of Russian conversational speech

Initial work in transcribing conversational telephone speech in Russian using acoustic seed models derived from other languages achieves results comparable to those obtained with models trained on the small conversation telephone speech corpus.

Semi-Supervised Model Training for Unbounded Conversational Speech Recognition

This work proposes a technique to construct a modern, high quality conversational speech training corpus on the order of hundreds of millions of utterances (or tens of thousands of hours) for both acoustic and language model training.

Development of a Korean speech recognition system with little annotated data

This paper investigates the development of a speech-totext transcription system for the Korean language in the context of the DGA RAPID Rapmat project to assess the influence of the vocabulary size, the type of language model, the acoustic unit, as well as incremental batch vs iterative decoding of the untranscribed audio corpus.

The 2007 AMI(DA) System for Meeting Transcription

This paper describes the development and system architecture of the 2007 AMIDA meeting transcription system, the third of such systems developed in a collaboration of six research sites and showed very competitive performance.

Generative Spoken Dialogue Language Modeling

dGSLM is introduced, the first “textless” model able to generate audio samples of naturalistic spoken dialogues and reproduces more naturalistic anduid turn taking compared to a text-based cascaded model.

Adapting Lexical and Language Models for Transcription of Highly Spontaneous Spoken Czech

Transitions between the most frequent colloquial words and their counterparts in formal Czech are introduced to solve the data sparsity problem when computing a probabilistic language model.

Statistical parametric speech synthesis using conversational data and phenomena

The synthesis of filled pause synthesis is investigated in relation to specific phonetic modelling of filled pauses and through techniques for the mixing of standard prompts with spontaneous utterances in order to retain the higher quality of standard speech based voices while still utilising the spontaneous speech for filled pause modelling.
...

References

SHOWING 1-4 OF 4 REFERENCES

From switchboard to fisher: telephone collection protocols, their uses and yields

In a process for producing a color television picture tube which comprises at least the step of coating phosphor slurries onto the inner surface of a panel to form a phosphor layer, the step of

Phonological Atlas of North America, http://www.ling.upenn.edu/phono_atlas/home.html

  • 2004

Phonological Atlas of North America, http://www.ling.upenn.edu/phono_atlas/home.html Linguistic Data ConsortiumCatalog National Institute of Standards and Technologies

  • Phonological Atlas of North America, http://www.ling.upenn.edu/phono_atlas/home.html Linguistic Data ConsortiumCatalog National Institute of Standards and Technologies
  • 2004

Catalog National Institute of Standards and Technologies

  • Telephone Collection Protocols, their Uses and Yields, Proceedings of EuroSpeech
  • 2003