On the State of German (Abstractive) Text Summarization

@inproceedings{Aumiller2023OnTS,
  title={On the State of German (Abstractive) Text Summarization},
  author={Dennis Aumiller and Jing Fan and Michael Gertz},
  booktitle={Datenbanksysteme f{\"u}r Business, Technologie und Web},
  year={2023}
}
With recent advancements in the area of Natural Language Processing, the focus is slowly shifting from a purely English-centric view towards more language-specific solutions, including German. Especially practical for businesses to analyze their growing amount of textual data are text summarization systems, which transform long input documents into compressed and more digestible summary texts. In this work, we assess the particular landscape of German abstractive text summarization and… 

Figures from this paper

2nd German Text Summarization Challenge

The 2nd German Text Summarization Challenge aimed to explore new ideas and solutions regarding an automatic quality assessment of German text summarizations and asked the participants to consider aspects such as correctness in content and grammar as well as facets like compactness and abstractiveness.

Klexikon: A German Dataset for Joint Summarization and Simplification

The creation of a new dataset for joint Text Simplification and Summarization based on German Wikipedia and the German children’s encyclopedia “Klexikon”, consisting of almost 2,900 documents is described, and a document-aligned version is released that particularly highlights the summarization aspect.

Summarization of German Court Rulings

This paper introduces a new dataset consisting of 100k German judgments with short summaries and creates a pre-processing pipeline tailored explicitly to the German legal domain, which implements multiple extractive as well as abstractive summarization systems and builds a wide variety of baseline models.

Error Analysis of using BART for Multi-Document Summarization: A Study for English and German Language

An in-depth error analysis is performed of the followed approach for both languages, which leads to identifying most notable errors, from made-up facts and topic delimitation, and quantifying the amount of extractiveness.

TeSum: Human-Generated Abstractive Summarization Corpus for Telugu

This work proposes a pipeline that crowd-sources summarization data and then aggressively filters the content via: automatic and partial expert evaluation, and creates a high-quality Telugu Abstractive Summarization dataset (TeSum) which is validated with sampling-based human evaluation.

Idiap Abstract Text Summarization System for German Text Summarization Task

This work builds an abstract text summarizer for the German language text using the state-of-the-art “Transformer” model and proposes an iterative data augmentation approach which uses synthetic data along with the real summarization data for theGerman language.

What Makes a Good Summary? Reconsidering the Focus of Automatic Summarization

A survey amongst heavy users of pre-made summaries finds that the current focus of the field does not fully align with participants' wishes, and proposes a methodology to evaluate the usefulness of a summary.

WikiLingua: A New Benchmark Dataset for Cross-Lingual Abstractive Summarization

A method for direct crosslingual summarization without requiring translation at inference time is proposed by leveraging synthetic data and Neural Machine Translation as a pre-training step, which significantly outperforms the baseline approaches, while being more cost efficient during inference.

Sequential Transfer Learning in NLP for German Text Summarization

The experiments suggest that pre-trained language models can improve summarizing texts and it is found that using multilingual BERT as contextual embeddings lifts the model by about 9 points of ROUGE-1 and RouGE-2 on a German summarization task.
...