Corpus ID: 216868259

MLSUM: The Multilingual Summarization Corpus

@article{Scialom2020MLSUMTM,
  title={MLSUM: The Multilingual Summarization Corpus},
  author={Thomas Scialom and Paul-Alexis Dray and Sylvain Lamprier and Benjamin Piwowarski and Jacopo Staiano},
  journal={ArXiv},
  year={2020},
  volume={abs/2004.14900}
}
  • Thomas Scialom, Paul-Alexis Dray, +2 authors Jacopo Staiano
  • Published 2020
  • Computer Science
  • ArXiv
  • We present MLSUM, the first large-scale MultiLingual SUMmarization dataset. Obtained from online newspapers, it contains 1.5M+ article/summary pairs in five different languages -- namely, French, German, Spanish, Russian, Turkish. Together with English newspapers from the popular CNN/Daily mail dataset, the collected data form a large scale multilingual dataset which can enable new research directions for the text summarization community. We report cross-lingual comparative analyses based on… CONTINUE READING

    Figures, Tables, and Topics from this paper.

    Explore Further: Topics Discussed in This Paper

    Pre-training via Paraphrasing
    3

    References

    Publications referenced by this paper.
    SHOWING 1-10 OF 68 REFERENCES
    Secure hop-by-hop aggregation of end-to-end concealed data in wireless sensor networks
    34
    IgG, IgM and IgA in the serum of cattle naturally infected with Mycobacterium paratuberculosis.
    17
    The Gaussian wire-tap channel
    1702
    Sum-Rate Maximization for Multiuser MIMO Wireless Powered Communication Networks
    62
    MLQA: Evaluating Cross-lingual Extractive Question Answering
    35
    Two New Evaluation Datasets for Low-Resource Machine Translation: Nepali-English and Sinhala-English
    53