Learn More
To improve the quality of the speech produced by a Text-to-Speech (TTS) system, it is important to obtain the maximum amount of information from the input text that may help in this task. This covers a wide range of possibilities that can go from the simple conversion of non orthographic items to more complex syntactic and semantic analysis. In this paper,(More)
This paper presents the results of our effort in improving the accuracy of a DTW-based automatic phonetic aligner. The adopted model assumes that the phonetic segment sequence is already known and so the goal is only to align the spoken utterance with a reference synthetic signal produced by waveform concatenation without prosodic modifications. Instead of(More)
Current time-domain pitch modification techniques have well known limitations for large variations of the original fundamental frequency. This paper proposes a technique for changing the pitch and duration of a speech signal based on time-scaling the linear prediction (LP) residual. The resulting speech signal achieves better quality than the traditional(More)
While global characteristics of the speaker's source and spectral features have been successfully employed in pathological voice detection, the underlying text has largely been ignored. In this work, we focus on experiments that exploit the text stimulus that is read by the subject. Features derived from text include the mean cepstral distortion of the(More)
This paper describes a new generic text-to-speech synthesis system, developed in the scope of the Tecnovoz Project. Although it was primarily targeted at speech synthesis in European Portuguese, its modular architecture and flexible components allows its use for different languages. We also provide a survey on the development of the language resources(More)
In this paper we describe a multipurpose Spoken Dialogue System platform associated with two distinct applications as an home intelligent environment and remote access to information databases. These applications differ substantially on contents and possible uses but gives us the chance to develop a platform where we were able to represent diverse services(More)
This work is focused on the evaluation of different methods to estimate the amount of jitter present in speech signals. The jitter value is a measure of the irregularity of a quasiperiodic signal and is a good indicator of the presence of pathologies in the larynx such as vocal fold nodules or a vocal fold polyp. Given the irregular nature of the speech(More)
This paper proposes a statistical phrase/accent model of voice fundamental frequency(F0) for speech synthesis. It presents an approach for automatic extraction and modeling of phrase and accent phenomena from F0 contours by taking into account their overall trends in the training data. An iterative optimization algorithm is described to extract these(More)
In this paper we describe a set of experiments aiming at building and evaluating a new phrasing module for European Portuguese Text-to-Speech Synthesis, using Classification and Regression Tree (CART) techniques on hand-labeled texts. Using the assessment criteria of matching boundary predictions against a reference example of phrased sentences, the best(More)