Author pages are created from data sourced from our academic publisher partnerships and public sources.
Share This Author
Ungoliant: An Optimized Pipeline for the Generation of a Very Large-Scale Multilingual Web Corpus
Since the introduction of large language models in Natural Language Processing, large raw corpora have played a crucial role in Computational Linguistics. However, most of these large raw corpora are… Expand