Learn More
This paper gives the final results of the ESTER evaluation campaign which started in 2003 and ended in January 2005. The aim of this campaign was to evaluate automatic broadcast news rich transcription systems for the French language. The evaluation tasks were divided into three main categories: orthographic transcription, event detection and tracking (e.g.(More)
The aims of the SpeechDat-Car project are to develop a set of speech databases to support training and testing of multilingual speech recognition applications in the car environment. As a result, a total of ten (10) equivalent and similar resources will be created. The 10 languages are Danish, each language 600 sessions will be recorded (from at least 300(More)
(1) Association Francophone (2) DGA/Centre technique d'Arcueil (3) ELDA de la Communication Parlée 16 bis av Prieur de la Côte d'Or 55-57 rue Brillat Savarin Abstract This paper gives an overview of the ESTER evaluation campaign. The aim of this campaign is to evaluate automatic broadcast news transcription systems for the French language. The evaluation(More)
The SpeechDat project aims to produce speech databases for all official languages of the European Union and some major dialectal variants and minority languages resulting in 28 speech databases. They will be recorded over fixed and mobile telephone networks. This will provide a realistic basis for training and assessment of both isolated and(More)
The CORAL -ROM project has delivered a multilingual corpus of spontaneous speech for the main romance languages (Italian, French, Portuguese and Spanish). The collection aims to represent the variety of speech acts performed in everyday language and to enable the description of prosodic and syntactic structures in the four romance languages. Sampling(More)
LRs remain expensive to create and thus rare relative to demand across languages and technology types. The accidental recreation of an LR that already exists is a nearly unforgiveable waste of scarce resources that is unfortunately not so easy to avoid. The number of catalogs the HLT researcher must search, with their different formats, make it possible to(More)
This paper describes the collection and transcription of large amounts of Arabic broadcast news speech data. More than 4000 hours of satellite data have been collected from various Arabic sources. The data was recorded from selected Arabic TV and radio stations in both Modern Standard Arabic and dialectal Arabic. Also, close to 2400 hours of data from a(More)
The analysis of lectures and meetings inside smart rooms has recently attracted much interest in the literature, being the focus of international projects and technology evaluations. A key enabler for progress in this area is the availability of Ambrish Tyagi has contributed to this work during two summer internships with the IBM T. appropriate multimodal(More)