DTW at Qur'an QA 2022: Utilising Transfer Learning with Transformers for Question Answering in a Low-resource Domain

  title={DTW at Qur'an QA 2022: Utilising Transfer Learning with Transformers for Question Answering in a Low-resource Domain},
  author={Damith Premasiri and Tharindu Ranasinghe and Wajdi Zaghouani and Ruslan Mitkov},
The task of machine reading comprehension (MRC) is a useful benchmark to evaluate the natural language understanding of machines. It has gained popularity in the natural language processing (NLP) field mainly due to the large number of datasets released for many languages. However, the research in MRC has been understudied in several domains, including religious texts. The goal of the Qur’an QA 2022 shared task is to fill this gap by producing state-of-the-art question answering and reading… 

Figures and Tables from this paper


FQuAD: French Question Answering Dataset
The present work introduces the French Question Answering Dataset (FQuAD), a French Native Reading Comprehension dataset of questions and answers on a set of Wikipedia articles that consists of 25,000+ samples for the 1.0 and 1.1 versions.
Neural Arabic Question Answering
The system for open domain question answering in Arabic (SOQAL) is based on two components: a document retriever using a hierarchical TF-IDF approach and a neural reading comprehension model using the pre-trained bi-directional transformer BERT.
AyaTEC: Building a Reusable Verse-Based Test Collection for Arabic Question Answering on the Holy Qur'an
This article introduces AyaTEC, a reusable test collection for verse-based question answering on the Holy Qur’an, which serves as a common experimental testbed for this task and proposes several evaluation measures to support the different types of questions and the nature of verse- based answers while integrating the concept of partial matching of answers in the evaluation.
AraBERT: Transformer-based Model for Arabic Language Understanding
This paper pre-trained BERT specifically for the Arabic language in the pursuit of achieving the same success that BERT did for the English language, and showed that the newly developed AraBERT achieved state-of-the-art performance on most tested Arabic NLP tasks.
An Exploratory Analysis of Multilingual Word-Level Quality Estimation with Cross-Lingual Transformers
The findings suggest that the word-level QE models based on powerful pre-trained transformers that are proposed in this paper generalise well across languages, making them more useful in real-world scenarios.
A Survey on Machine Reading Comprehension Systems
It is demonstrated that the focus of research has changed in recent years from answer extraction to answer generation, from single to multi-document reading comprehension, and from learning from scratch to using pre-trained embeddings.
MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text
MCTest is presented, a freely available set of stories and associated questions intended for research on the machine comprehension of text that requires machines to answer multiple-choice reading comprehension questions about fictional stories, directly tackling the high-level goal of open-domain machine comprehension.
SQuAD: 100,000+ Questions for Machine Comprehension of Text
A strong logistic regression model is built, which achieves an F1 score of 51.0%, a significant improvement over a simple baseline (20%).
A Survey on Machine Reading Comprehension: Tasks, Evaluation Metrics, and Benchmark Datasets
A more precise classification method of MRC tasks with 4 different attributes is proposed and an obvious giant gap between existing MRC models and genuine human-level reading comprehension is shown.