mT5: A massively multilingual pre-trained text-to-text transformer
@article{Xue2020mT5AM, title={mT5: A massively multilingual pre-trained text-to-text transformer}, author={Linting Xue and Noah Constant and A. Roberts and Mihir Kale and Rami Al-Rfou and Aditya Siddhant and A. Barua and Colin Raffel}, journal={ArXiv}, year={2020}, volume={abs/2010.11934} }
The recent "Text-to-Text Transfer Transformer" (T5) leveraged a unified text-to-text format and scale to attain state-of-the-art results on a wide variety of English-language NLP tasks. In this paper, we introduce mT5, a multilingual variant of T5 that was pre-trained on a new Common Crawl-based dataset covering 101 languages. We describe the design and modified training of mT5 and demonstrate its state-of-the-art performance on many multilingual benchmarks. All of the code and model… CONTINUE READING
Figures and Tables from this paper
7 Citations
XLM-T: Scaling up Multilingual Machine Translation with Pretrained Cross-lingual Transformer Encoders
- Computer Science
- ArXiv
- 2020
- PDF
Leveraging ParsBERT and Pretrained mT5 for Persian Abstractive Text Summarization
- Computer Science
- ArXiv
- 2020
- PDF
How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models
- Computer Science
- ArXiv
- 2020
- PDF
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
- Computer Science
- 2021
- 1
- PDF
References
SHOWING 1-10 OF 52 REFERENCES
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
- Computer Science, Mathematics
- J. Mach. Learn. Res.
- 2020
- 822
- PDF
SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing
- Computer Science
- EMNLP
- 2018
- 625
- Highly Influential
- PDF
FILTER: An Enhanced Fusion Method for Cross-lingual Language Understanding
- Computer Science
- ArXiv
- 2020
- 7
- PDF
CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data
- Computer Science, Mathematics
- LREC
- 2020
- 45
- PDF
InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training
- Computer Science
- ArXiv
- 2020
- 18
- PDF