Learn More
In this paper, we propose a direct method for speech rate estimation from acoustic features without requiring any automatic speech transcription. We compare various spectral and temporal signal analysis and smoothing strategies to better characterize the underlying syllable structure to derive speech rate. The proposed algorithm extends the methods of(More)
In this paper, we propose a novel method for speech rate estimation without requiring automatic speech recognition. It extends the methods of spectral subband correlation by including temporal correlation and the use of selecting prominent spectral subbands for correlation. Further more, to address some of the practical issues in previously published(More)
An algorithm for automatic speech prominence detection is reported in this paper. We describe a comparative analysis on various acoustic features for word prominence detection and report results using a spoken dialog corpus with manually assigned prominence labels. The focus is on features such as spectral intensity and speech rate that are directly(More)
We propose a multi-pass linear fold algorithm for sentence boundary detection in spontaneous speech. It uses only prosodic cues and does not rely on segmentation information from a speech recognition decoder. We focus on features based on pitch breaks and pitch durations, study their local and global structural properties and find their relationship with(More)
In this paper, we propose a wavelet analysis based piecewise linear stylization of the pitch trajectory. We also address the often-faced difficulty in handling the tradeoff between mean squared error and the number of lines used for fitting, where a heuristic approach is typically used to make the stylization choice. We pose the piecewise linear stylization(More)
Engineering automatic speech recognition (ASR) for speech to speech (S2S) translation systems, especially targeting languages and domains that do not have readily available spoken language resources, is immensely challenging due to a number of reasons. In addition to contending with the conventional data-hungry speech acoustic and language modeling needs,(More)
An unsupervised approach for automatic speech prominence detection is proposed in this paper. The algorithm scores prominence by fusing different acoustic feature sets from the speech signal correlation envelope. In addition, we investigate part of speech (POS) as a linguistic correlate for speech prominence. We also underscore the inadequacy of the(More)
  • 1