Learn More
Within the framework of the construction of a fact database, we defined guidelines to extract named entities, using a taxonomy based on an extension of the usual named entities definition. We thus defined new types of entities with broader coverage including substantive-based expressions. These extended named entities are hierarchical (with types and(More)
The evaluation of named entity recognition (NER) methods is an active field of research. This includes the recognition of named entities in speech transcripts. Evaluating NER systems on automatic speech recognition (ASR) output whereas human reference annotation was prepared on clean manual transcripts raises difficult alignment issues. These issues are(More)
This paper reports on Task 1b of the 2015 CLEF eHealth evaluation lab which extended the previous information extraction tasks of ShARe/CLEF eHealth evaluation labs by considering ten types of entities including disorders, that were to be extracted from biomedical text in French. The task consisted of two phases: entity recognition (phase 1), in which(More)
RÉSUMÉ Dans cet article, nous présentons la campagne 2012 du défi fouille de texte (DEFT). Cette édition traite de l'indexation automatique par des mots-clés d'articles scientifiques au travers de deux pistes. La première fournit aux participants la terminologie des mots-clés employés dans les documents à indexer tandis que la seconde ne fournit pas cette(More)
Within the framework of the Quaero project, we proposed a new definition of named entities, based upon an extension of the coverage of named entities as well as the structure of those named entities. In this new definition, the extended named entities we proposed are both hierarchical and compositional. In this paper, we focused on the annotation of a(More)
We present in this article the methods we used for obtaining measures to ensure the quality and well-formedness of a text corpus. These measures allow us to determine the compatibility of a corpus with the treatments we want to apply on it. We called this method " certification of corpus ". These measures are based upon the characteristics required by the(More)
Our experiments rely on a combination of machine-learning (CRF) and rule-based (Hei-delTime) systems. First, a CRF system identifies both EVENTS and TIMEX3, along with polarity values for EVENT and types of TIMEX. Second, the HeidelTime tool identifies DOCTIME and TIMEX3 elements, and computes DocTimeRel for each EVENT identified by the CRF. Third, another(More)
OBJECTIVE This paper describes the approaches the authors developed while participating in the i2b2/VA 2010 challenge to automatically extract medical concepts and annotate assertions on concepts and relations between concepts. DESIGN The authors'approaches rely on both rule-based and machine-learning methods. Natural language processing is used to(More)
Recent renewed interest in de-identification (also known as "anonymisation") has led to the development of a series of systems in the United States with very good performance on challenge test sets. De-identification needs however to be tuned to the local documents and their specificities. We address here two issues raised in this context. First, tuning is(More)
This paper reports on the 3rd CLEFeHealth evaluation lab, which continues our evaluation resource building activities for the medical domain. In this edition of the lab, we focus on easing patients and nurses in authoring, understanding, and accessing eHealth information. The 2015 CLEFeHealth evaluation lab was structured into two tasks, fo-cusing on(More)