• Corpus ID: 233210701

Building a Swedish Open-Domain Conversational Language Model

@inproceedings{Norlund2021BuildingAS,
  title={Building a Swedish Open-Domain Conversational Language Model},
  author={Tobias Norlund and Agnes Stenbom},
  booktitle={NODALIDA},
  year={2021}
}
We present on-going work of evaluating the, to our knowledge, first large generative language model trained to converse in Swedish, using data from the online discussion forum Flashback. We conduct a human evaluation pilot study that indicates the model is often able to respond to conversations in both a human-like and informative manner, on a diverse set of topics. While data from online forums can be useful to build conversational systems, we reflect on the negative consequences that… 

Figures and Tables from this paper

Lessons Learned from GPT-SW3: Building the First Large-Scale Generative Language Model for Swedish

The results of quantitive evaluation through perplexity indicate that GPT-SW3 is a competent model in comparison with existing autoregressive models of similar size.

References

SHOWING 1-10 OF 25 REFERENCES

Recipes for building an opendomain chatbot

  • 2020

Towards a Human-like Open-Domain Chatbot

Meena, a multi-turn open-domain chatbot trained end-to-end on data mined and filtered from public domain social media conversations, is presented and a human evaluation metric called Sensibleness and Specificity Average (SSA) is proposed, which captures key elements of a human-like multi- turn conversation.

DIALOGPT : Large-Scale Generative Pre-training for Conversational Response Generation

It is shown that conversational systems that leverage DialoGPT generate more relevant, contentful and context-consistent responses than strong baseline systems.

Attention is All you Need

A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.

DeepSpeed: System Optimizations Enable Training Deep Learning Models with Over 100 Billion Parameters

Explore new techniques in Microsoft's open source library called DeepSpeed, which advances large model training by improving scale, speed, cost, and usability, unlocking the ability to train

Is Machine Learning Speaking my Language? A Critical Look at the NLP-Pipeline Across 8 Human Languages

A team including speakers of 8 languages takes a critical look at the typical NLP pipeline and how even when a language is technically supported, substantial caveats remain to prevent full participation.

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.

The Woman Worked as a Babysitter: On Biases in Language Generation

The notion of the regard towards a demographic is introduced, the varying levels of regard towards different demographics are used as a defining metric for bias in NLG, and the extent to which sentiment scores are a relevant proxy metric for regard is analyzed.

Artificial intelligence and communication: A Human–Machine Communication research agenda

This article provides a starting point for articulating the differences between communicative AI and previous technologies and introduces a theoretical basis for navigating these conditions in the form of scholarship within human–machine communication (HMC).

The Curious Case of Neural Text Degeneration

By sampling text from the dynamic nucleus of the probability distribution, which allows for diversity while effectively truncating the less reliable tail of the distribution, the resulting text better demonstrates the quality of human text, yielding enhanced diversity without sacrificing fluency and coherence.