Learn More
This paper presents effective triphone mapping for acoustic models training in automatic speech recognition, which allows the synthesis of unseen triphones. The description of this data-driven model clustering, including experiments performed using 350 hours of a Slovak audio database of mixed read and spontaneous speech, are presented. The proposed(More)
This paper presents rule-based triphone mapping for acoustic models training in automatic speech recognition. We test if the incorporation of expanded knowledge at the level of parameter tying in acoustic modeling improves the performance of automatic speech recognition in Slovak. We propose a novel technique of knowledge-based triphone tying, which allows(More)
This paper presents an evaluation of the contextual factors of HMM-based speech synthesis and coding systems. Two experimental setups are proposed that are based on successive context addition from phonetic to full-context. The aim was to investigate the impact of the individual contextual factors on the speech quality. In that sense important and(More)
Phonological features extracted by neural network have shown interesting potential for low bit rate speech vocoding. The span of phonological features is wider than the span of phonetic features, and thus fewer frames need to be transmitted. Moreover, the binary nature of phonological features enables a higher compression ratio at minor quality cost. In(More)
We investigate a vocoder based on artificial neural networks using a phonological speech representation. Speech decomposition is based on the phonological encoders, realised as neural network classifiers, that are trained for a particular language. The speech reconstruction process involves using a Deep Neural Network (DNN) to map phonological features(More)
The speech signal conveys information on different time scales from short (20–40 ms) time scale or segmental, associated to phonological and phonetic information to long (150–250 ms) time scale or supra segmental, associated to syllabic and prosodic information. Linguistic and neurocognitive studies recognize the phonological classes at segmental level as(More)
Current HMM-based low bit rate speech coding systems work with phonetic vocoders. Pitch contour coding (on frame or phoneme level) is usually fairly orthogonal to other speech coding parameters. We make an assumption in our work that the speech signal contains supra-segmental cues. Hence, we present encoding of the pitch on the syllable level, used in the(More)
The paper presents an approach to unit selection speech synthesis in noise. The approach is based on a modification of the speech synthesis method originally published in [1], where the distance of a candidate unit from its cluster center is used as the unit selection cost. We found out that using an additional measure evaluating intelligibility for the(More)