Corpus ID: 220347531

Playing with Words at the National Library of Sweden - Making a Swedish BERT

  title={Playing with Words at the National Library of Sweden - Making a Swedish BERT},
  author={Martin Malmsten and Love B{\"o}rjeson and Chris Haffenden},
This paper introduces the Swedish BERT ("KB-BERT") developed by the KBLab for data-driven research at the National Library of Sweden (KB). Building on recent efforts to create transformer-based BERT models for languages other than English, we explain how we used KB's collections to create and train a new language-specific BERT model for Swedish. We also present the results of our model in comparison with existing models - chiefly that produced by the Swedish Public Employment Service… Expand
9 Citations
Large-Scale Contextualised Language Modelling for Norwegian
  • 3
  • PDF
ARBERT & MARBERT: Deep Bidirectional Transformers for Arabic
  • 19
  • PDF
EMBEDDIA Tools, Datasets and Challenges: Resources and Hackathon Contributions
  • 6
  • PDF
mT5: A massively multilingual pre-trained text-to-text transformer
  • 35
  • PDF


How multilingual is Multilingual BERT?
  • 327
  • PDF
HuggingFace's Transformers: State-of-the-art Natural Language Processing
  • 1,563
  • Highly Influential
  • PDF
SALDO: a touch of yin to WordNet’s yang
  • 99
SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing
  • 761
  • PDF
Living Language: An Introduction to Linguistic Anthropology
  • 123
  • PDF
Imagined Communities: Reflections on the Origin and Spread of Nationalism
  • 18,539
  • PDF
Open Sourcing German BERT: Insights into pre-training BERT from scratch
  • 2019
spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing
  • 2017