Introducing the Welsh Text Summarisation Dataset and Baseline Systems

  title={Introducing the Welsh Text Summarisation Dataset and Baseline Systems},
  author={Ignatius M Ezeani and Mahmoud El-Haj and Jonathan Morris and Dawn Knight},
Welsh is an official language in Wales and is spoken by an estimated 884,300 people (29.2% of the population of Wales). Despite this status and estimated increase in speaker numbers since the last (2011) census, Welsh remains a minority language undergoing revitalisation and promotion by Welsh Government and relevant stakeholders. As part of the effort to increase the availability of Welsh digital technology, this paper introduces the first Welsh summarisation dataset, which we provide freely… 
1 Citations

Figures and Tables from this paper

Creation of an Evaluation Corpus and Baseline Evaluation Scores for Welsh Text Summarisation

The first human vs metrics Welsh summarisation evaluation results and dataset are introduced, which will serve as benchmarks for the development of summarisers and evaluation metrics in other minority language contexts.



Using a Keyness Metric for Single and Multi Document Summarisation

This paper shows the results of the participation in the MultiLing 2013 summarisation tasks with single-document and multi-document corpus-based summarisers for both Arabic and English languages and shows how these systems performed in the automatic evaluation.

Creating Welsh Language Word Embeddings

This study adapted two existing methods, word2vec and fastText, to automatically learn Welsh word embeddings taking into account syntactic and morphological idiosyncrasies of this language, and conducted both qualitative and quantitative evaluation of the resulting wordembeddings.

Leveraging Pre-Trained Embeddings for Welsh Taggers

The results of the experiments on learning a simple multi-task neural network model for part-of-speech and semantic tagging for Welsh using a pre-trained embedding model from FastText are presented.

Multi-document arabic text summarisation

This work addresses the lack of Arabic multi-document corpora for summarisation and the absence of automatic and manual Arabic gold-standard summaries, and demonstrates the use of Google Translate in creating an Arabic version of the DUC-2002 dataset.

The Financial Narrative Summarisation Shared Task FNS 2021

This paper presents the results and findings of the Financial Narrative Summarisation Shared Task on summarising UK annual reports. The shared task was organised as part of the Financial Narrative

Joint abstractive and extractive method for long financial document summarization

This work proposes an end to end financial narrative summarization system that first selects salient sentences from the document and then paraphrases extracted sentences to generate an overall concise summary that maximises the ROUGE metric with the gold standard summary.

A Survey on Deep Learning-Based Automatic Text Summarization Models

The deep learning-based text summarization model gives a good performance as compared to the conventional techniques and is reviewed based on deep learning techniques.

Abstractive text summarization using LSTM-CNN based deep learning

Experimental results on the datasets CNN and DailyMail show that the proposed ATSDL framework outperforms the state-of-the-art models in terms of both semantics and syntactic structure, and achieves competitive results on manual linguistic quality evaluation.

Multi-Document Summarization By Sentence Extraction

This paper discusses a text extraction approach to multi-document summarization that builds on single-document summarization methods by using additional, available information about the document set

ROUGE 2.0: Updated and Improved Measures for Evaluation of Summarization Tasks

ROUGE 2.0 is introduced, which has several updated measures of ROUGE: R OUGE-N+Synonyms, ROUge-Topic, RouGE-Topic+Synonym, R RouGE- Topic-Uniq and ROUAGE-TopicUniq+ Synonyms; all of which are improvements over the core ROU GE measures.