• Publications
  • Influence
CamemBERT: a Tasty French Language Model
This paper investigates the feasibility of training monolingual Transformer-based language models for other languages, taking French as an example and evaluating their language models on part-of-speech tagging, dependency parsing, named entity recognition and natural language inference tasks.
What Does BERT Learn about the Structure of Language?
This work provides novel support for the possibility that BERT networks capture structural information about language by performing a series of experiments to unpack the elements of English language structure learned by BERT.
Universal Dependencies 2.1
The annotation scheme is based on (universal) Stanford dependencies, Google universal part-of-speech tags, and the Interset interlingua for morphosyntactic tagsets for morpho-lingual tagsets.
Overview of the SPMRL 2013 Shared Task: A Cross-Framework Evaluation of Parsing Morphologically Rich Languages
This paper presents and analyzes parsing results obtained by the task participants, and provides an analysis and comparison of the parsers across languages and frameworks, reported for gold input as well as more realistic parsing scenarios.
Statistical Parsing of Morphologically Rich Languages (SPMRL) What, How and Whither
This paper synthesizes the contributions of researchers working on parsing Arabic, Basque, French, German, Hebrew, Hindi and Korean to point out shared solutions across languages and suggests itself as a source of directions for future investigations.
Introducing the SPMRL 2014 Shared Task on Parsing Morphologically-rich Languages
This paper provides a short overview of the 2014 SPMRL shared task goals, data sets, and evaluation setup and describes the description of participating systems and the analysis of their results as part of (Seddah et al., 2014).
Le corpus Sequoia : annotation syntaxique et exploitation pour l’adaptation d’analyseur par pont lexical (The Sequoia Corpus : Syntactic Annotation and Use for a Parser Lexical Domain Adaptation
Nous presentons dans cet article la methodologie de constitution et les caracteristiques du corpus Sequoia, un corpus en francais, syntaxiquement annote d'apres un schema d'annotation tres proche de
Simple, Interpretable and Stable Method for Detecting Words with Usage Change across Corpora
This work proposes an alternative approach that does not use vector space alignment, and instead considers the neighbors of each word, and demonstrates its effectiveness in 9 different setups, considering different corpus splitting criteria.
Universal Dependencies 2.0 – CoNLL 2017 Shared Task Development and Test Data
This release contains the test data used in the CoNLL 2017 shared task on parsing Universal Dependencies, and complements the UD 2.0 release with 18 new parallel test sets and 4 test sets in surprise languages.
Parsing Morphologically Rich Languages: Introduction to the Special Issue
This special issue reports on methods that successfully address the challenges involved in parsing a range of morphologically rich languages (MRLs), and describes the challenges in parsing MRLs and outlines the contributions of the articles in the special issue.