• Publications
  • Influence
CamemBERT: a Tasty French Language Model
TLDR
This paper investigates the feasibility of training monolingual Transformer-based language models for other languages, taking French as an example and evaluating their language models on part-of-speech tagging, dependency parsing, named entity recognition and natural language inference tasks. Expand
What Does BERT Learn about the Structure of Language?
TLDR
This work provides novel support for the possibility that BERT networks capture structural information about language by performing a series of experiments to unpack the elements of English language structure learned by BERT. Expand
The Lefff, a Freely Available and Large-coverage Morphological and Syntactic Lexicon for French
TLDR
The Lefff is introduced, a freely available, accurate and large-coverage morphological and syntactic lexicon for French, used in many NLP tools such as large- coverage parsers. Expand
Building a free French wordnet from multilingual resources
This paper describes automatic construction a freely-available wordnet for French (WOLF) based on Princeton WordNet (PWN) by using various multilingual resources. Polysemous words were dealt with anExpand
Universal Dependencies 2.1
TLDR
The annotation scheme is based on (universal) Stanford dependencies, Google universal part-of-speech tags, and the Interset interlingua for morphosyntactic tagsets for morpho-lingual tagsets. Expand
Coupling an Annotated Corpus and a Morphosyntactic Lexicon for State-of-the-Art POS Tagging with Less Human Effort
TLDR
It is found that the use of a lexicon improves the quality of the tagger at any stage of development of either resource, and that for fixed performance levels the availability of the full lexicon consistently reduces the need for supervised data by at least one half. Expand
Coupling an annotated corpus and a lexicon for state-of-the-art POS tagging
TLDR
It is found that the use of a lexicon improves the quality of the tagger at any stage of development of either resource, and that for fixed performance levels the availability of the full lexicon consistently reduces the need for supervised data by at least one half. Expand
Morphology Based Automatic Acquisition of Large-coverage Lexica
TLDR
A new technique for constructing wide-coverage morphological lexica from large corpora and morphological knowledge, with an application to French, that relies on the idea that the existence of a hypothetical lemma can be guessed if several different words found in the corpus are best interpreted as morphological variants of this lemma. Expand
Controllable Sentence Simplification
TLDR
A discrete parametrization mechanism that provides explicit control on simplification systems based on Sequence-to-Sequence models is adapted, which establishes the state of the art at 41.87 SARI on the WikiLarge test set, a +1.42 improvement over the best previously reported score. Expand
Developing a French FrameNet: Methodology and First results
TLDR
Focus on a set of notional domains, a subset of English frames were delimited, adapted to French data when necessary, and developed the corresponding French lexicon, believing that working domain by domain helped to enforce the coherence of the resulting resource. Expand
...
1
2
3
4
5
...