Beat Pfister

Learn More
This paper proposes a distinction between existing multilingual synthesis systems and mixed-lingual or polyglot synthesis systems. The latter should be capable of synthesising with the same voice utterances which contain foreign language words or word groups. As a first step towards polyglot synthetic speech, the design and realisation of a 4-lingual(More)
We propose a novel method for Acoustic Event Detection (AED). In contrast to speech, sounds coming from acoustic events may be produced by a wide variety of sources. Furthermore, distinguishing them often requires analyzing an extended time period due to the lack of a clear sub-word unit. In order to incorporate the long-time frequency structure for AED, we(More)
In multilingual countries, text-to-speech synthesis systems often have to deal with texts containing inclusions of multiple other languages in form of phrases, words, or even parts of words. In such multilingual cultural settings, listeners expect a high-quality text-to-speech synthesis system to read such texts in a way that the origin of the inclusions is(More)
We propose a novel method for Acoustic Event Recognition (AER). In contrast to speech, sounds coming from acoustic events may be produced by a wide variety of sources. Furthermore, distinguishing them often requires analyzing an extended time period due to the lack of a clear sub-word unit. In order to incorporate the long-time frequency structure for AER,(More)
We present a new method for the estimation of a continuous fundamental frequency (F0) contour. The algorithm implements a global optimization and yields virtually error-free F0 contours for high quality speech signals. Such F0 contours are subsequently used to extract a continuous fundamental wave. Some local properties of this wave, together with a number(More)
An automatic system for segmenting speech signals used for the training of statistical prosody models is presented. Starting from a canonical transcription, the system simultaneously delivers an accurate phonetic segmentation and the matched phonetic transcription indicating pronunciation variants. Although the system is HMM-based, it uses only the speech(More)
Polyglot text-to-speech synthesis, i.e. the synthesis of sentences containing one or more inclusions from other languages, primarily depends on an accurate morphosyntactic analyzer for such mixed-lingual texts. From the output of this analyzer, the pronunciation can be derived by means of phonological transformations which are language-specific and depend(More)
In forensic casework, the application of automatic speaker verification (SV) aims to determine the likelihood ratio of a suspect being vs. being not the speaker of an incriminating speech recording. For that purpose, the likelihood of the anti-speaker has to be estimated from the speech of an adequate number of other speakers. In many cases, speech signals(More)