Lionel Nicolas

Learn More
In this paper, we introduce the Léxico de Formas Flexionadas del Español (Leffe), a wide-coverage morphological and syntactic Spanish lexicon based on the Alexina lexical framework. We explain how the Leffe has been created by merging together several heterogeneous lexicons and how the Alexina lexical framework has been applied to Spanish. We also introduce(More)
The effectiveness of parsers based on manually created resources, namely a grammar and a lexicon, rely mostly on the quality of these resources. Thus, increasing the parser coverage and precision usually implies improving these two resources. Their manual improvement is a time consuming and complex task : identifying which resource is the true culprit for a(More)
The MERLIN corpus is a written learner corpus for Czech, German, and Italian that has been designed to illustrate the Common European Framework of Reference for Languages (CEFR) with authentic learner data. The corpus contains 2,290 learner texts produced in standardized language certifications covering CEFR levels A1–C1. The MERLIN annotation scheme(More)
In a morphological lexicon, each entry combines a lemma with a specific inflection class, often defined by a set of inflection rules. Therefore, such lexica usually give a satisfying account of inflectional operations. Derivational information, however, is usually badly covered. In this paper we introduce a novel approach for enriching morphological lexica(More)
In order to produce efficient Natural Language Processing (NLP) tools, reliable linguistic resources are a preliminary requirement. When available for a given language, the resources are generally far below the expectations in terms of quality, coverage or usability. This paper presents a project whose ambition is to enhance the production capacities of(More)
The coverage of a parser depends mostly on the quality of the underlying grammar and lexicon. The development of a lexicon both complete and accurate is an intricate and demanding task. We introduce a automatic process for detecting missing, incomplete and erroneous entries in a morphological and syntactic lexicon, and for suggesting corrections hypotheses(More)
Since its publication in 2001, the Common European Framework of Reference for Languages (CEFR) has gained a leading role as an instrument of reference for language teaching and certification. Nonetheless, there is a growing concern about CEFR levels being insufficiently illustrated in terms of authentic learner data. Such concern grows even stronger when(More)
In natural language processing many practical tasks, such as speech recognition, information retrieval and machine translation depend on a large vocabulary and statistical language models. For morphologically rich languages, such as Finnish and Turkish, the construction of a vocabulary and language models that have a sufficient coverage is particularly(More)