Jan Hoidekr

Learn More
This paper describes the design of the first large-scale IR test collection built for the Czech language. The creation of this collection also happens to be very challenging, as it is based on a continuous text stream from automatic transcription of spontaneous speech and thus lacks clearly defined document boundaries. All aspects of the collection building(More)
This paper describes our effort with an automatic transcription of TV ice-hockey commentaries. The ice-hockey matches were played during the World Championships 2000 and 2001 in St. Petersburg (Russia) and Hannover (Germany), respectively and were transmitted by the Czech TV channels NOVA and CTV1 with an accompanying commentary of Robert Záruba. Annotation(More)
In our paper, we present a method for incorporating available linguistic information into a statistical language model that is used in ASR system for transcribing spontaneous speech. We employ the class-based language model paradigm and use the morphological tags as the basis for world-to-class mapping. Since the number of different tags is at least by one(More)
This article describes the real-time speech recognition system for closed-captioning of TV ice-hockey commentaries. Automatic transcription of TV commentary accompanying an ice-hockey match is usually a hard task due to the spontaneous speech of a commentator put often into a very loud background noise created by the public, music, siren, drums, whistle,(More)
  • 1