Learn More
In this paper, we propose a direct method for speech rate estimation from acoustic features without requiring any automatic speech transcription. We compare various spectral and temporal signal analysis and smoothing strategies to better characterize the underlying syllable structure to derive speech rate. The proposed algorithm extends the methods of(More)
In this paper we describe the first phase of development of our speech-to-speech system between English and Modern Persian under the DARPA Babylon program. We give an overview of the various system components: the front end ASR, the machine translation system and the speech generation system. Challenges such as the sparseness of available spoken language(More)
In this paper, we propose a novel method for speech rate estimation without requiring automatic speech recognition. It extends the methods of spectral subband correlation by including temporal correlation and the use of selecting prominent spectral subbands for correlation. Further more, to address some of the practical issues in previously published(More)
In this paper we describe our spoken english-persian medical dialogue translation system. We describe the data collection effort and give an overview of the component technologies, including speech recognition, translation, dialogue management , and user interface design. The individual modules and system are designed for flexibility, and to be able to(More)
An algorithm for automatic speech prominence detection is reported in this paper. We describe a comparative analysis on various acoustic features for word prominence detection and report results using a spoken dialog corpus with manually assigned prominence labels. The focus is on features such as spectral intensity and speech rate that are directly(More)
In this paper, we propose a wavelet analysis based piecewise linear stylization of the pitch trajectory. We also address the often-faced difficulty in handling the tradeoff between mean squared error and the number of lines used for fitting, where a heuristic approach is typically used to make the stylization choice. We pose the piecewise linear stylization(More)
Engineering automatic speech recognition (ASR) for speech to speech (S2S) translation systems, especially targeting languages and domains that do not have readily available spoken language resources, is immensely challenging due to a number of reasons. In addition to contending with the conventional data-hungry speech acoustic and language modeling needs,(More)
  • 1