Corpus ID: 207853304

CamemBERT: a Tasty French Language Model

@article{Martin2020CamemBERTAT,
  title={CamemBERT: a Tasty French Language Model},
  author={Louis Martin and Benjamin Muller and Pedro Javier Ortiz Su{\'a}rez and Yoann Dupont and Laurent Romary and 'Eric de la Clergerie and Djam{\'e} Seddah and Beno{\^i}t Sagot},
  journal={ArXiv},
  year={2020},
  volume={abs/1911.03894}
}
  • Louis Martin, Benjamin Muller, +5 authors Benoît Sagot
  • Published 2020
  • Computer Science
  • ArXiv
  • Pretrained language models are now ubiquitous in Natural Language Processing. Despite their success, most available models have either been trained on English data or on the concatenation of data in multiple languages. This makes practical use of such models—in all languages except English—very limited. Aiming to address this issue for French, we release CamemBERT, a French version of the Bi-directional Encoders for Transformers (BERT). We measure the performance of CamemBERT compared to… CONTINUE READING

    Citations

    Publications citing this paper.
    SHOWING 1-10 OF 30 CITATIONS

    Project PIAF: Building a Native French Question-Answering Dataset

    VIEW 7 EXCERPTS
    CITES BACKGROUND
    HIGHLY INFLUENCED

    FlauBERT: Unsupervised Language Model Pre-training for French

    VIEW 7 EXCERPTS
    CITES RESULTS, BACKGROUND & METHODS
    HIGHLY INFLUENCED

    A Systematic Analysis of Morphological Content in BERT Models for Multiple Languages

    VIEW 1 EXCERPT
    CITES METHODS
    HIGHLY INFLUENCED

    FQuAD: French Question Answering Dataset

    VIEW 7 EXCERPTS
    CITES BACKGROUND & METHODS
    HIGHLY INFLUENCED

    Multilingual Zero-shot Constituency Parsing

    VIEW 4 EXCERPTS
    CITES BACKGROUND
    HIGHLY INFLUENCED

    Pre-training Polish Transformer-based Language Models at Scale

    VIEW 2 EXCERPTS
    CITES BACKGROUND
    HIGHLY INFLUENCED

    WikiBERT models: deep transfer learning for many languages

    VIEW 5 EXCERPTS
    CITES METHODS & BACKGROUND
    HIGHLY INFLUENCED

    FILTER CITATIONS BY YEAR

    2019
    2020

    CITATION STATISTICS

    • 10 Highly Influenced Citations

    • Averaged 15 Citations per year from 2019 through 2020

    References

    Publications referenced by this paper.
    SHOWING 1-10 OF 83 REFERENCES

    CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data

    VIEW 5 EXCERPTS
    HIGHLY INFLUENTIAL

    75 Languages, 1 Model: Parsing Universal Dependencies Universally

    VIEW 5 EXCERPTS
    HIGHLY INFLUENTIAL

    Building a Treebank for French

    VIEW 5 EXCERPTS
    HIGHLY INFLUENTIAL

    Building a Treebank for French, pages 165–187

    • Anne Abeillé, Lionel Clément, François Toussenel.
    • Kluwer, Dordrecht.
    • 2003
    VIEW 6 EXCERPTS
    HIGHLY INFLUENTIAL

    Cross-lingual Language Model Pretraining

    VIEW 9 EXCERPTS
    HIGHLY INFLUENTIAL

    Cross-lingual language model pretraining. CoRR

    • Guillaume Lample, Alexis Conneau
    • 2019
    VIEW 9 EXCERPTS
    HIGHLY INFLUENTIAL

    Multilingual bert

    • Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova.
    • https://github.com/google-research/ bert/blob/master/multilingual.md.
    • 2018
    VIEW 4 EXCERPTS
    HIGHLY INFLUENTIAL

    RoBERTa: A Robustly Optimized BERT Pretraining Approach

    VIEW 8 EXCERPTS
    HIGHLY INFLUENTIAL