Learn More
While automatic methods for phonetic segmentation of speech can help with rapid annotation of corpora, most methods rely either on manually segmented data to initially train the process or manual post-processing. This is very time-consuming and slows down porting of speech systems to new languages. In the context of prosody corpora for text-to-speech (TTS)(More)
This paper proposes a distinction between existing multilingual synthesis systems and mixed-lingual or polyglot synthesis systems. The latter should be capable of synthesising with the same voice utterances which contain foreign language words or word groups. As a first step towards polyglot synthetic speech, the design and realisation of a 4-lingual(More)
Polyglot text-to-speech synthesis, i.e. the synthesis of sentences containing one or more inclusions from other languages, primarily depends on an accurate morpho-syntactic analyzer for such mixed-lingual texts. From the output of this analyzer, the pronunciation can be derived by means of phonological transformations which are language-specific and depend(More)
The SIWIS project aims to investigate spoken language translation, where both the speaker characteristics and prosody are translated. This means the translation carries not only spoken content, but also speaker identification, emotion and intent. We describe the background of the project, and present some initial approaches and results. These include the(More)
In text-dependent speaker verification the speech signals have to be time-aligned. For that purpose dynamic time warping (DTW) can be used which performs the alignment by minimizing the Euclidean cepstral distance between the test and the reference utterance. While the cumulative Euclidean cepstral distance, which can be gathered from the DTW algorithm,(More)
We present a new method for the estimation of a continuous fundamental frequency (F0) contour. The algorithm implements a global optimization and yields virtually error-free F0 contours for high quality speech signals. Such F0 contours are subsequently used to extract a continuous fundamental wave. Some local properties of this wave, together with a number(More)
We present a new approach to quasi text-independent speaker verification based on pattern matching. Our method first seeks phonetically matched segments in two speech signals. For all aligned frame pairs of these segments we compute the probability that they were uttered by the same speaker. Based on these frame-level probabilities we take the decision(More)
An automatic system for segmenting speech signals used for the training of statistical prosody models is presented. Starting from a canonical transcription, the system simultaneously delivers an accurate phonetic segmentation and the matched phonetic transcription indicating pronunciation variants. Although the system is HMM-based, it uses only the speech(More)
A polyglot text-to-speech synthesis system which is able to read aloud mixed-lingual text has first of all to derive the correct pronunciation. This is achieved with an accurate morpho-syntactic analyzer that works simultaneously as language detector, followed by a phonological component which performs various phonological transformations. The result of(More)