Dragisa Miskovic

Learn More
This paper reports a spoken natural language dialogue system that manages the interaction between the user and the industrial robot ABB IRB 140. To the extent that the dialogue system is multimodal, it uses three communication modalities: (i) spoken language (automatic speech recognition and text-to-speech synthesis), (ii) visual recognition of the figures(More)
In this paper a novel method for energy normalization is presented. The objective of this method is to remove unwanted energy variations caused by different microphone gains, various loudness levels across speakers, as well as changes of single speaker loudness level over time. The solution presented here is based on principles used in automatic gain(More)
This paper presents our initial results in a new approach to vocal tract normalization (VTN). In experiments based on continuous automatic speech recognition (ASR) the VTN procedure is in general carried out in both training and test phase. In the training phase it is used to obtain speaker independent acoustic models of phones. In the test phase it is used(More)
This paper proposes a method of creating language models for highly inflective non-agglutinative languages. Three types of language models were considered - a common n-gram model, an n-gram model of lemmas and a class n-gram model. The last two types were specially designed for the Serbian language reflecting its unique grammar structure. All the language(More)
This paper presents our study on different phonetic segmentation methods based on hidden Markov models evaluated against a Hebrew speech corpus. We investigated methods for fully automatic phonetic segmentation using only the corpus which should be segmented and automatically generated phonetic transcriptions. A new method for phonetic boundary correction(More)
This paper describes a decoder for large vocabulary continuous speech recognition developed at the Faculty of Technical Sciences, University of Novi Sad. The decoder is an open source solution written in the C++ programming language. The structure of the decoder is modular, allowing relatively simple modification and expansion of the code. It(More)
The paper reports a solution for the integration of the industrial robot ABB IRB140 with the system for automatic speech recognition (ASR) and the system for computer vision. The robot has the task to manipulate the objects placed randomly on a pad lying on a table, and the computer vision system has to recognize their characteristics (shape, dimension,(More)
Speech recognition systems are commonly modelled by hidden Markov models with Gaussian mixture models as observation density functions. These models have a significant number of parameters, which usually leads to the problem of data sparsity, especially for under-resourced languages such as Serbian. One of the ways to overcome the problem of data sparsity(More)
This paper gives a brief review of the development of systems for automatic speech recognition and text-to-speech synthesis in Serbian, Croatian and Macedonian language, at the Faculty of Engineering, University of Novi Sad, Serbia. The systems developed within this project enable two-way communication between humans and machines. These systems are(More)