Damien Lolive

Learn More
In a voice transformation context, prosody transformation using parallel corpora is quite unrealistic as such corpora are difficult and also expensive to build. Based on this observation, we propose an approach for transforming prosody using nonparallel corpora thanks to the MLLR adaptation strategy. This methodology is applied to the joint transformation(More)
Building speech corpora is a first and crucial step for every text-to-speech synthesis system. Nowadays, the use of statistical models implies the use of huge sized corpora that need to be recorded, transcribed, annotated and segmented to be usable. The variety of corpora necessary for recent applications (content, style, etc.) makes the use of existing(More)
This article describes a new approach to estimate F0 curves using a B-Spline model characterized by a knot sequence and associated control points. The free parameters of the model are the number of knots and their location. The free-knot placement, which is a NP-hard problem, is done using a global MLE within a simulated-annealing strategy. The optimal knot(More)
The development of new methods for given speech and natural language processing tasks usually consists in annotating large corpora of data before applying machine learning techniques to train models or to extract information. Beyond scientific aspects, creating and managing such annotated data sets is a recurrent problem. While using human annotators is(More)
In the speech processing field, stylization of fundamental frequency <i>F</i> <sub>0</sub> has been subjected to numerous works. Models proposed in the literature rely on knowledge stemming from phonology and linguistics. We propose an approach that deals with the issue of <i>F</i> <sub>0</sub> curve stylization requiring as few linguistic assumptions as(More)
This paper presents a software library, namely ROOTS for Rich Object Oriented Transcription System, that helps to describe spoken messages in a coherent manner linking sequences of items on numerous levels (linguistic, phonological, or acoustic). The proposed representation is incremental and can thus describe any or all parts of an utterance. In order to(More)
Speech synthesis systems usually use the Viterbi algorithm as a basis for unit selection, while it is not the only possible choice. In this paper, we study a speech synthesis system relying on the A∗ algorithm, which is a general pathfinding strategy developing a graph rather than a lattice. Using state of the art techniques, we propose and analyze(More)
Traditional utterance phonetization methods concatenate pronunciations of uncontextualized constituent words. This approach is too weak for some languages, like French, where transitions between words imply pronunciation modifications. Moreover, it makes it difficult to consider global pronunciation strategies, for instance to model a specific speaker or a(More)