Learn More
The French Technolangue MEDIA-EVALDA project aims to evaluate spoken understanding approaches. This paper describes the semantic annotation scheme of a common dialog corpus which will be used for developing and evaluating spoken understanding models and for linguistic studies. A common semantic representation has been formalized and agreed upon by the(More)
The LIMSI ARISE system provides vocal access to rail travel information for main French intercity connections, including timetables, simulated fares and reservations, reductions and services. Our goal is to obtain high dialog success rates with a very open structure, where the user is free to ask any question or to provide any information at any point in(More)
Within the framework of the construction of a fact database, we defined guidelines to extract named entities, using a taxonomy based on an extension of the usual named entities definition. We thus defined new types of entities with broader coverage including substantive-based expressions. These extended named entities are hierarchical (with types and(More)
The evaluation of named entity recognition (NER) methods is an active field of research. This includes the recognition of named entities in speech transcripts. Evaluating NER systems on automatic speech recognition (ASR) output whereas human reference annotation was prepared on clean manual transcripts raises difficult alignment issues. These issues are(More)
This paper presents and reports on the progress of the EVALDA/MEDIA project, focusing on the recording and annotating protocol of the reference dialogue corpus. The aim of this project is to design and test an evaluation methodology to compare and diagnose the context-dependent and independent understanding capability of spoken language dialogue systems.(More)
Pitch perception for short-duration fundamental frequency (F0) glissandos was studied. In the first part, new measurements using the method of adjustment are reported. Stimuli were F0 glissandos centered at 220 Hz. The parameters under study were: F0 glissando extents (0, 0.8, 1.5, 3, 6, and 12 semitones, i.e., 0, 10.17, 18.74, 38.17, 76.63, and 155.56 Hz),(More)
This paper presents a new paradigm of " challenge " evaluation of Spoken Language Understanding. This methodology aims at a quantitative assessment with a high diagnostic power, by opposition with standard ATIS-like frameworks. This paper details the methodology as well as the results of an evaluation campaign held by the French CNRS research agency. The(More)
We present in this paper the three LIMSI question-answering systems on speech transcripts which participated to the QAst 2009 evaluation. These systems are based on a complete and multi-level analysis of both queries and documents. These systems use an automatically generated research descriptor. A score based on those descriptors is used to select(More)
We focus in this paper on the named entity recognition task in spoken data. The proposed approach investigates the use of various contexts of the words to improve recognition. Experimental results carried out on speech data from French broadcast news, using conditional random fields (CRF) show that the use of semantic information, generated using symbolic(More)
Within the framework of the Quaero project, we proposed a new definition of named entities, based upon an extension of the coverage of named entities as well as the structure of those named entities. In this new definition, the extended named entities we proposed are both hierarchical and compositional. In this paper, we focused on the annotation of a(More)