DTW at Qur'an QA 2022: Utilising Transfer Learning with Transformers for Question Answering in a Low-resource Domain

@article{Premasiri2022DTWAQ,
  title={DTW at Qur'an QA 2022: Utilising Transfer Learning with Transformers for Question Answering in a Low-resource Domain},
  author={Damith Premasiri and Tharindu Ranasinghe and Wajdi Zaghouani and Ruslan Mitkov},
  journal={ArXiv},
  year={2022},
  volume={abs/2205.06025}
}
The task of machine reading comprehension (MRC) is a useful benchmark to evaluate the natural language understanding of machines. It has gained popularity in the natural language processing (NLP) field mainly due to the large number of datasets released for many languages. However, the research in MRC has been understudied in several domains, including religious texts. The goal of the Qur’an QA 2022 shared task is to fill this gap by producing state-of-the-art question answering and reading… 

Figures and Tables from this paper

References

SHOWING 1-10 OF 49 REFERENCES
FQuAD: French Question Answering Dataset
TLDR
The present work introduces the French Question Answering Dataset (FQuAD), a French Native Reading Comprehension dataset of questions and answers on a set of Wikipedia articles that consists of 25,000+ samples for the 1.0 and 1.1 versions.
Neural Arabic Question Answering
TLDR
The system for open domain question answering in Arabic (SOQAL) is based on two components: a document retriever using a hierarchical TF-IDF approach and a neural reading comprehension model using the pre-trained bi-directional transformer BERT.
AyaTEC: Building a Reusable Verse-Based Test Collection for Arabic Question Answering on the Holy Qur'an
TLDR
This article introduces AyaTEC, a reusable test collection for verse-based question answering on the Holy Qur’an, which serves as a common experimental testbed for this task and proposes several evaluation measures to support the different types of questions and the nature of verse- based answers while integrating the concept of partial matching of answers in the evaluation.
AraBERT: Transformer-based Model for Arabic Language Understanding
TLDR
This paper pre-trained BERT specifically for the Arabic language in the pursuit of achieving the same success that BERT did for the English language, and showed that the newly developed AraBERT achieved state-of-the-art performance on most tested Arabic NLP tasks.
An Exploratory Analysis of Multilingual Word-Level Quality Estimation with Cross-Lingual Transformers
TLDR
The findings suggest that the word-level QE models based on powerful pre-trained transformers that are proposed in this paper generalise well across languages, making them more useful in real-world scenarios.
A Survey on Machine Reading Comprehension Systems
TLDR
It is demonstrated that the focus of research has changed in recent years from answer extraction to answer generation, from single to multi-document reading comprehension, and from learning from scratch to using pre-trained embeddings.
MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text
TLDR
MCTest is presented, a freely available set of stories and associated questions intended for research on the machine comprehension of text that requires machines to answer multiple-choice reading comprehension questions about fictional stories, directly tackling the high-level goal of open-domain machine comprehension.
SQuAD: 100,000+ Questions for Machine Comprehension of Text
TLDR
A strong logistic regression model is built, which achieves an F1 score of 51.0%, a significant improvement over a simple baseline (20%).
A Survey on Machine Reading Comprehension: Tasks, Evaluation Metrics, and Benchmark Datasets
TLDR
A more precise classification method of MRC tasks with 4 different attributes is proposed and an obvious giant gap between existing MRC models and genuine human-level reading comprehension is shown.
...
...