• Corpus ID: 236428949

One Question Answering Model for Many Languages with Cross-lingual Dense Passage Retrieval

  title={One Question Answering Model for Many Languages with Cross-lingual Dense Passage Retrieval},
  author={Akari Asai and Xinyan Velocity Yu and Jungo Kasai and Hannaneh Hajishirzi},
  booktitle={Neural Information Processing Systems},
We present Cross-lingual Open-Retrieval Answer Generation (CORA), the first unified many-to-many question answering (QA) model that can answer questions across many languages, even for ones without language-specific annotated data or knowledge sources. We introduce a new dense passage retrieval algorithm that is trained to retrieve documents across languages for a question. Combined with a multilingual autoregressive generation model, CORA answers directly in the target language without any… 

ZusammenQA: Data Augmentation with Specialized Models for Cross-lingual Open-retrieval Question Answering System

The proposed system for the MIA Shared Task on Cross-lingual Openretrieval Question Answering (COQA) is introduced, showing that language- and domain-specialization as well as data augmentation help, especially for low-resource languages.

CL-ReLKT: Cross-lingual Language Knowledge Transfer for Multilingual Retrieval Question Answering

This paper proposes a novel CL-ReQA method utilizing the concept of language knowledge transfer and a new cross-lingual consistency training technique to create a multilingual embedding space for ReQA that outperforms competitors in 19 out of 21 settings of CL- reQA.

Cross-Lingual Open-Domain Question Answering with Answer Sentence Generation

A cross-lingual generative model that produces full-sentence answers by exploiting passages written in multiple languages, including languages different from the question, outperforms answer sentence selection baselines for all 5 languages and monolingualGenerative pipelines for three out of five languages studied.

Cross-Lingual GenQA: A Language-Agnostic Generative Question Answering Approach for Open-Domain Question Answering

The G EN -T Y D I QA dataset is presented, which extends the TyDiQA evaluation data with natural-sounding, well-formed answers in Arabic, Bengali, English, Japanese, and Russian, and it is shown that a G EN QA sequence-to-sequence-based model outperforms a state-of-the-art Answer Sentence Selection model.

MIA 2022 Shared Task: Evaluating Cross-lingual Open-Retrieval Question Answering for 16 Diverse Languages

The results of the Workshop on Multilingual Information Access 2022 Shared Task, evaluating cross-lingual open-retrieval question answering (QA) systems in 16 typologically diverse languages are presented, with the best system obtains particularly significant improvements in Tamil.

Augmenting Passage Representations with Query Generation for Enhanced Cross-Lingual Dense Retrieval

This paper proposes to use cross-lingual query generation to augment passage representations with queries in languages other than the original passage language so that the representation can encode more information across the different target languages.

Cross-Lingual GenQA: Open-Domain Question Answering with Answer Sentence Generation

This paper introduces G EN -T Y D I QA, an extension of the TyDiQA dataset with well-formed and complete answers for Arabic, Bengali, English, Japanese, and Russian questions and presents the first Cross-Lingual answer sentence generation system (C ROSS -L INGUAL G EN QA).

Evaluating and Modeling Attribution for Cross-Lingual Question Answering

This work is the first to study attribution for cross-lingual question answering and finds that Natural Language Inference models and PaLM 2 fine-tuned on a very small amount of attribution data can accurately detect attribution.

Ask Me Anything in Your Native Language

This work presents a novel approach based on single encoder for query and passage for retrieval from multi-lingual collection, together with cross-lingUAL generative reader that achieves a new state of the art in both retrieval and end-to-end tasks on the XOR TyDi dataset.

Zero-shot cross-lingual open domain question answering

This paper employs a passage reranker, the fusion-in-decoder technique for generation, and a wiki data entity-based post-processing system to tackle the inability to generate entities across all languages.

XOR QA: Cross-lingual Open-Retrieval Question Answering

This work constructs a large-scale dataset built on 40K information-seeking questions across 7 diverse non-English languages that TyDi QA could not find same-language answers for and introduces a task framework, called Cross-lingual Open-Retrieval Question Answering (XOR QA), that consists of three new tasks involving cross-lingually document retrieval from multilingual and English resources.

MKQA: A Linguistically Diverse Benchmark for Multilingual Open Domain Question Answering

Multilingual Knowledge Questions and Answers is introduced, an open- domain question answering evaluation set comprising 10k question-answer pairs aligned across 26 typologically diverse languages, making results comparable across languages and independent of language-specific passages.

TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages

A quantitative analysis of the data quality and example-level qualitative linguistic analyses of observed language phenomena that would not be found in English-only corpora are presented.

REALM: Retrieval-Augmented Language Model Pre-Training

The effectiveness of Retrieval-Augmented Language Model pre-training (REALM) is demonstrated by fine-tuning on the challenging task of Open-domain Question Answering (Open-QA) and is found to outperform all previous methods by a significant margin, while also providing qualitative benefits such as interpretability and modularity.

Towards Zero-Shot Multilingual Synthetic Question and Answer Generation for Cross-Lingual Reading Comprehension

This work proposes a simple method to generate large amounts of multilingual question and answer pairs by a single generative model, thus removing the need for human annotations in the target languages.

Entity Linking in 100 Languages

A new formulation for multilingual entity linking, where language-specific mentions resolve to a language-agnostic Knowledge Base is proposed, where the model outperforms state-of-the-art results from a far more limited cross-lingual linking task.

Cross-lingual Retrieval for Iterative Self-Supervised Training

This work found that the cross-lingual alignment can be further improved by training seq2seq models on sentence pairs mined using their own encoder outputs, and developed a new approach -- cross- Lingual retrieval for iterative self-supervised training (CRISS), where mining and training processes are applied iteratively, improving cross-lingsual alignment and translation ability at the same time.

75 Languages, 1 Model: Parsing Universal Dependencies Universally

It is found that fine-tuning a multilingual BERT self-attention model pretrained on 104 languages can meet or exceed state-of-the-art UPOS, UFeats, Lemmas, (and especially) UAS, and LAS scores, without requiring any recurrent or language-specific components.

Low-Resource Parsing with Crosslingual Contextualized Representations

The non-contextual part of the learned language models are examined to demonstrate that polyglot language models better encode crosslingual lexical correspondence compared to aligned monolingual language models, providing further evidence thatpolyglot training is an effective approach toCrosslingual transfer.

Learning to Translate for Multilingual Question Answering

This paper builds a feature for each combination of translation direction and method, and trains a model that learns optimal feature weights on a large forum dataset consisting of posts in English, Arabic, and Chinese.