Structure and annotation of Polish LVCSR speech database

  title={Structure and annotation of Polish LVCSR speech database},
  author={Katarzyna Klessa and Grazyna Demenko},
This paper reports on the problems occurring in the process of building LVCSR (Large Vocabulary Continuous Speech Recognition) corpora based on the internal evaluation of the Polish database JURISDIC. The initial assumptions are discussed together with technical matters concerning the database realization and annotation results. Providing rich database statistics was considered crucial especially regarding linguistic description both for database evaluation and for the implementation of… CONTINUE READING


Publications referenced by this paper.
Showing 1-10 of 10 references

Wyszukiwanie w repozytoriach tekstowych w języku polskim (Eng.: Searching text repositories for the Polish language), available on the internet, accessed on

  • M. Klubiński
  • M.Klubinski.pdf
  • 2009
1 Excerpt

An Investigation into the intra- and inter-labeler agreement in the JURISDIC database

  • K. Klessa, J. Bachan
  • Accepted for Speech and Language Technology,
  • 2008
1 Excerpt

Implementation of Grapheme-to-Phoneme Rules and Extended SAMPA Alphabet in Polish Text-to-Speech Synthesis

  • G. Demenko, M. Wypych, E. Baranowska
  • Speech and Language Technology,
  • 2003
1 Excerpt

Specification of corpora and word lists in 12 languages. LC-STAR Deliverable D1.1

  • U Ziegenhain
  • 2002
1 Excerpt

Specification of Databases - Specification of annotation. SPEECON Deliverale D214

  • V. Fischer, F. Diehl, A. Kiessling, K. Marasek
  • 2000
2 Excerpts

Handbook of Standards and Resources for Spoken Language Systems, deGruyter

  • D. Gibbon, R. Moore, R. Winski
  • 1997
1 Excerpt

Similar Papers

Loading similar papers…