Learn More
While automatic methods for phonetic segmentation of speech can help with rapid annotation of corpora, most methods rely either on manually segmented data to initially train the process or manual post-processing. This is very time-consuming and slows down porting of speech systems to new languages. In the context of prosody corpora for text-to-speech (TTS)(More)
We present a new method for the estimation of a continuous fundamental frequency (F0) contour. The algorithm implements a global optimization and yields virtually error-free F0 contours for high quality speech signals. Such F0 contours are subsequently used to extract a continuous fundamental wave. Some local properties of this wave, together with a number(More)
We propose a novel method for Acoustic Event Detection (AED). In contrast to speech, sounds coming from acoustic events may be produced by a wide variety of sources. Furthermore , distinguishing them often requires analyzing an extended time period due to the lack of a clear sub-word unit. In order to incorporate the long-time frequency structure for AED,(More)
In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information regarding Elsevier's archiving and manuscript policies are encouraged to visit: Abstract In multilingual countries, text-to-speech synthesis systems often have to(More)
This paper proposes a distinction between existing multilingual synthesis systems and mixed-lingual or polyglot synthesis systems. The latter should be capable of synthesising with the same voice utterances which contain foreign language words or word groups. As a first step towards polyglot synthetic speech, the design and realisation of a 4-lingual(More)
Polyglot text-to-speech synthesis, i.e. the synthesis of sentences containing one or more inclusions from other languages, primarily depends on an accurate morpho-syntactic analyzer for such mixed-lingual texts. From the output of this analyzer, the pronunciation can be derived by means of phonological transformations which are language-specific and depend(More)
In forensic casework, the application of automatic speaker verification (SV) aims to determine the likelihood ratio of a suspect being vs. being not the speaker of an incriminating speech recording. For that purpose, the likelihood of the anti-speaker has to be estimated from the speech of an adequate number of other speakers. In many cases, speech signals(More)
We present a new approach to quasi text-independent speaker verification based on pattern matching. Our method first seeks phonetically matched segments in two speech signals. For all aligned frame pairs of these segments we compute the probability that they were uttered by the same speaker. Based on these frame-level probabilities we take the decision(More)
An automatic system for segmenting speech signals used for the training of statistical prosody models is presented. Starting from a canonical transcription, the system simultaneously delivers an accurate phonetic segmentation and the matched phonetic transcription indicating pronunciation variants. Although the system is HMM-based, it uses only the speech(More)