Learn More
This paper presents effective triphone mapping for acoustic models training in automatic speech recognition, which allows the synthesis of unseen triphones. The description of this data-driven model clustering, including experiments performed using 350 hours of a Slovak audio database of mixed read and spontaneous speech, are presented. The proposed(More)
This paper presents rule-based triphone mapping for acoustic models training in automatic speech recognition. We test if the incorporation of expanded knowledge at the level of parameter tying in acoustic modeling improves the performance of automatic speech recognition in Slovak. We propose a novel technique of knowledge-based triphone tying, which allows(More)
Recent work in text to speech synthesis has pointed to the benefit of using a continuous pitch estimate; that is, one that records pitch even when voicing is not present. Such an approach typically requires interpolation. The purpose of this letter is to show that a continuous pitch estimation is available from a combination of otherwise well known(More)
Phonological features extracted by neural network have shown interesting potential for low bit rate speech vocoding. The span of phonological features is wider than the span of phonetic features, and thus fewer frames need to be transmitted. Moreover, the binary nature of phonological features enables a higher compression ratio at minor quality cost. In(More)
The speech signal conveys information on different time scales from short (20–40 ms) time scale or segmental, associated to phonological and phonetic information to long (150–250 ms) time scale or supra segmental, associated to syllabic and prosodic information. Linguistic and neurocognitive studies recognize the phonological classes at segmental level as(More)
  • Milos Cernak
  • 2006
The paper presents an approach to unit selection speech synthesis in noise. The approach is based on a modification of the speech synthesis method originally published in A.W. Black and P. Taylor (1997), where the distance of a candidate unit from its cluster center is used as the unit selection cost. We found out that using an additional measure evaluating(More)
We investigate a vocoder based on artificial neural networks using a phonological speech representation. Speech decomposition is based on the phonological encoders, realised as neural network classifiers, that are trained for a particular language. The speech reconstruction process involves using a Deep Neural Network (DNN) to map phonological features(More)