Balázs Tarján

Learn More
—In this paper, we re-evaluate morph (data-driven subword) and word lexical models used for large vocabulary continuous speech recognition of agglutinative languages. Since such speech recognition systems are applied mostly for information retrieval purposes we use evaluation metrics accordingly. Standard 3-gram language model with one million words(More)
Efficient large vocabulary continuous speech recognition of morphologically rich languages is a big challenge due to the rapid vocabulary growth. To improve the results various subword units-called as morphs-are applied as basic language elements. The improvements over the word baseline, however, are changing from negative to error rate halving across(More)
Morph-based language modeling has been efficiently applied in improving the accuracy of Large-Vocabulary Continuous Speech Recognition (LVCSR) systems ± especially in morphologically rich languages. However, the rate of improvements varies greatly and the underlying principles have been only superficially studied. Having a method that can predict the(More)
Under real-life conditions several factors may be present that make the automatic recognition of speech difficult. The most obvious examples are background noise, peculiarities of the speaker's voice, sloppy articulation and strong emotional load. These all pose difficult problems for robust speech recognition, but it is not exactly clear how much each(More)
The improvement achieved by changing the basis of speech recognition from words to morphs (various sub-word units) varies greatly across tasks and languages. We make an attempt to explore the source of this variability by the investigation of three LVCSR tasks corresponding to three speech genres of a highly agglutinative language. Novel, press conference(More)
This paper introduces our work and results related to a multiple language continuous speech recognition task. The aim was to design a system that introduces tolerable amount of recognition errors for point of interest words in voice navigational queries even in the presence of real-life traffic noise. Additional challenges were that no task-specific(More)
This paper summarizes our recent efforts made to automatically transcribe call center conversations in real-time. Data sparseness issue is addressed due to the small amount of transcribed training data. Accordingly, first the potentials in the inclusion of additional non-conventional training texts are investigated, and then morphological language models(More)
In this paper, the application of LVCSR (Large Vocabulary Continuous Speech Recognition) technology is investigated for real-time, resource-limited broadcast close captioning. The work focuses on transcribing live broadcast conversation speech to make such programs accessible to deaf viewers. Due to computational limitations, real time factor (RTF) and(More)
This paper summarizes our recent efforts made to transcribe real-life Call Center conversations automatically with respect to non-verbal acoustic events, as well. Future Call Centers – as cognitive infocom systems – must respond automatically not only for well formed utterances but also for spontaneous and non-word speaker manifestations and must be robust(More)
  • 1