Learn More
We present our participation in Task 1a of the 2013 CLEF-eHEALTH Challenge, whose goal was the identification of disorder named entities from electronic medical records. We developed a supervised CRF model that based on a rich set of features learns to predict disorder named entities. The CRF system uses external knowledge from specialized biomedical(More)
Whereas multilingual comparable corpora have been used to identify translations of words or terms, monolingual corpora can help identify paraphrases. The present work addresses paraphrases found between two different discourse types: specialized and lay texts. We therefore built comparable corpora of specialized and lay texts in order to detect equivalent(More)
Nowadays a large amount of health information is available to the public, but medical language is often difficult for lay people to understand. Developing means to make medical information more comprehensible is therefore a real need. In this regard, a useful resource would be a corpus of specialized and lay paraphrases. To this end we built comparable(More)
OBJECTIVE To present a series of experiments: (1) to evaluate the impact of pre-annotation on the speed of manual annotation of clinical trial announcements; and (2) to test for potential bias, if pre-annotation is utilized. METHODS To build the gold standard, 1400 clinical trial announcements from the clinicaltrials.gov website were randomly selected and(More)
We translated an existing English negation lexicon (NegEx) to Swedish, French, and German and compared the lexicon on corpora from each language. We observed Zipf's law for all languages, i.e., a few phrases occur a large number of times, and a large number of phrases occur fewer times. Negation triggers "no" and "not" were common for all languages;(More)
OBJECTIVE The goal of this study is to assist the translation of a medical terminology (MedlinePlus) into French. METHODS We combined two types of approaches to acquire French translations of English MedlinePlus terms. The first is knowledge-based and relies on the conceptual information of the UMLS metathesaurus. The second method is a corpus-based NLP(More)
This paper presents the SeeDev Task of the BioNLP Shared Task 2016. The purpose of the SeeDev Task is the extraction from scientific articles of the descriptions of genetic and molecular mechanisms involved in seed development of the model plant, Arabidopsis thaliana. The SeeDev task consists in the extraction of many different event types that involve a(More)
This paper presents the Bacteria Biotope task of the BioNLP Shared Task 2016, which follows the previous 2013 and 2011 editions. The task focuses on the extraction of the locations (biotopes and geographical places) of bacteria from PubMed abstracts and the characterization of bacteria and their associated habitats with respect to reference knowledge(More)
In previous work, we presented a preliminary study to identify paraphrases between technical and lay discourse types from medical corpora dedicated to the French language. In this paper, we test the hypothesis that the same kinds of paraphrases as for French can be detected between English technical and lay discourse types and report the adaptation of our(More)
OBJECTIVE (1) To evaluate a state-of-the-art natural language processing (NLP)-based approach to automatically de-identify a large set of diverse clinical notes. (2) To measure the impact of de-identification on the performance of information extraction algorithms on the de-identified documents. MATERIAL AND METHODS A cross-sectional study that included(More)