Lourdes Aguilar

Learn More
Literature review on prosody reveals the lack of corpora for prosodic studies in Catalan and Spanish. In this paper, we present a corpus intended to fill this gap. The corpus comprises two distinct data-sets, a news subcorpus and a dialogue subcorpus, the latter containing either conversational or task-oriented speech. More than 25 h were recorded by twenty(More)
This article reports the process of building a balanced text corpus taking into account prosodic features. We formalize the application of greedy algorithms for text selection and we discuss their limitations. We also defend an expert guideline for text manipulation that significantly improves the performance of the algorithms. The application of this(More)
In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information regarding Elsevier's archiving and manuscript policies are encouraged to visit: Abstract A set of tools to analyze inconsistencies observed in a Cat_ToBI labeling(More)
This paper reports on the results of a pilot study that was run to assess the labeling consistency of the proposed approach in Sp-ToBI before starting a large-scale production of annotations in the project Glissando. This test should serve to refine the model and to maintain consistently the annotation conventions across transcription sites. The Spanish(More)
This paper presents an experimental study on how corpus-based automatic prosodic information labeling can be transferred from a source language to a different target language. Tone accent identification models trained for Span-ish, using the ESMA corpus, are used to automatically assign tonal accent ToBI labels on the (English) Boston Radio news corpus, and(More)
The temporal organization of discourse has produced a great deal of works in several languages pointing to different aims: from studies where the identification of cues about the planning of linguistic message is treated to studies in which duration models for text-to-speech systems are proposed. This work is a first step towards the description of Catalan(More)
This paper presents our work around the FESTCAT project, whose main goal was the development of voices for the Festival suite in Catalan. In the first year, we produced the corpus and the speech data needed for build 10 voices using the Clu-nits (unit selection) and the HTS (Markov models) methods. The resulting voices are freely available on the web page(More)