Concatenative text-to-speech synthesis based on prototype waveform interpolation (a time frequency approach)

Abstract

Department of Eletrical Engineering Universidade Estadual de Campinas Campinas, SP, Brazil ABSTRACT This paper presents some preliminary methods to apply the TimeFrequency Interpolation technique TFI [3] to concatenative text-to-speech synthesis. The TFI technique described here is a pitch-synchronous time-frequency approach of the well known Prototype-Waveform Interpolation technique PWI [2]. The basic concepts of representing the speech signal in the Time-Frequency Domain as well as techniques to perform Time-Scale and PitchScale modifications are described. Using the flexibility of TFI technique to perform spectral smothing, a method was developed to minimize the spectral mismatch at the boundaries of the Synthesis-Units SUs. The proposed system was evaluated using SUs (Diphones) and prosodic modifications generated by the Festival system [1]. An informal subjective test was performed, between the proposed TFI system and the standard TD-PSOLA system, highligthing the superior quality of the proposed system in comparasion with TD-PSOLA.

Extracted Key Phrases

3 Figures and Tables

Cite this paper

@inproceedings{Morais2000ConcatenativeTS, title={Concatenative text-to-speech synthesis based on prototype waveform interpolation (a time frequency approach)}, author={Edmilson Morais and Paul Taylor and F{\'a}bio Violaro}, booktitle={INTERSPEECH}, year={2000} }