Common Sense Beyond English: Evaluating and Improving Multilingual Language Models for Commonsense Reasoning

@article{Lin2021CommonSB,
  title={Common Sense Beyond English: Evaluating and Improving Multilingual Language Models for Commonsense Reasoning},
  author={Bill Yuchen Lin and Seyeon Lee and Xiaoyang Qiao and Xiang Ren},
  journal={ArXiv},
  year={2021},
  volume={abs/2106.06937}
}
Commonsense reasoning research has so far been limited to English. We aim to evaluate and improve popular multilingual language models (ML-LMs) to help advance commonsense reasoning (CSR) beyond English. We collect the Mickey corpus, consisting of 561k sentences in 11 different languages, which can be used for analyzing and improving ML-LMs. We propose Mickey Probe, a language-general probing task for fairly evaluating the common sense of popular ML-LMs across different languages. In addition… 
Leveraging Knowledge in Multilingual Commonsense Reasoning
TLDR
This work proposes to use English as a pivot language, utilizing English knowledge sources for Commonsense reasoning framework via a translate-retrieve-translate (TRT) strategy, and demonstrates that TRT with external knowledge can significantly improve multilingual commonsense reasoning in both zero-shot and translate-train settings.
Wino-X: Multilingual Winograd Schemas for Commonsense Reasoning and Coreference Resolution
TLDR
This work presents Wino-X, a parallel dataset of German, French, and Russian schemas, aligned with their English counterparts, to investigate whether neural machine translation (NMT) models can perform CoR that requires commonsense knowledge and whether multilingual language models (MLLMs) are capable of CSR across multiple languages.
RICA: Evaluating Robust Inference Capabilities Based on Commonsense Axioms
TLDR
A new challenge, RICA: Robust Inference using Commonsense Axioms, that evaluates robust commonsense inference despite textual perturbations and shows that PTLMs perform no better than random guessing on the zero-shot setting, are heavily impacted by statistical biases, and are not robust to perturbation attacks.
Transfer Learning for Multi-lingual Tasks - a Survey
TLDR
This survey provides a comprehensive overview of the existing literature with a focus on transfer learning techniques in multilingual tasks and identifies potential opportunities for further research in this domain.
AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing
TLDR
This comprehensive survey paper explains various core concepts like pretraining, Pretraining methods, pretraining tasks, embeddings and downstream adaptation methods, presents a new taxonomy of T-PTLMs and gives brief overview of various benchmarks including both intrinsic and extrinsic.
GeoMLAMA: Geo-Diverse Commonsense Probing on Multilingual Pre-Trained Language Models
TLDR
A frame-work for geo-diverse commonsense probing on multilingual PLMs (mPLMs) is introduced and a corresponding benchmark Geo -diverse Commonsense M ultilingual La nguage M odels A nalysis (G EO ML AMA) dataset is benchmarked.
Commonsense Knowledge Reasoning and Generation with Pre-trained Language Models: A Survey
TLDR
A survey of commonsense knowledge acquisition and reasoning tasks, the strengths and weaknesses of state-of-the-art pre-trained models for commonsense reasoning and generation as revealed by these tasks, and reflects on future research directions are presented.
Probing Commonsense Explanation in Dialogue Response Generation
TLDR
This study formalizes the problem by framing commonsense as a latent variable in the RG task and using explanations for responses as textual form of commonsense, and collecting 6k annotated explanations justifying responses from four dialogue datasets and asking humans to verify them.
Challenges and Strategies in Cross-Cultural NLP
Various efforts in the Natural Language Processing (NLP) community have been made to accommodate linguistic diversity and serve speakers of many different languages. However, it is important to
Think Before You Speak: Using Self-talk to Generate Implicit Commonsense Knowledge for Response Generation
TLDR
This paper presents a self-talk approach that first generates the implicit commonsense knowledge and then generates response by referencing the externalized knowledge, all using one generative model.
...
...

References

SHOWING 1-10 OF 36 REFERENCES
XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning
TLDR
This work introduces Cross-lingual Choice of Plausible Alternatives (XCOPA), a typologically diverse multilingual dataset for causal commonsense reasoning in 11 languages, revealing that current methods based on multilingual pretraining and zero-shot fine-tuning transfer suffer from the curse of multilinguality and fall short of performance in monolingual settings by a large margin.
X-FACTR: Multilingual Factual Knowledge Retrieval from Pretrained Language Models
TLDR
A code-switching-based method is proposed to improve the ability of multilingual LMs to access knowledge, and its effectiveness on several benchmark languages is verified, to properly handle language variations.
XNLI: Evaluating Cross-lingual Sentence Representations
TLDR
This work constructs an evaluation set for XLU by extending the development and test sets of the Multi-Genre Natural Language Inference Corpus to 14 languages, including low-resource languages such as Swahili and Urdu and finds that XNLI represents a practical and challenging evaluation suite and that directly translating the test data yields the best performance among available baselines.
Multilingual LAMA: Investigating Knowledge in Multilingual Pretrained Language Models
TLDR
This work translates the established benchmarks TREx and GoogleRE into 53 languages and finds that using mBERT as a knowledge base yields varying performance across languages and pooling predictions across languages improves performance.
XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization
TLDR
The Cross-lingual TRansfer Evaluation of Multilingual Encoders XTREME benchmark is introduced, a multi-task benchmark for evaluating the cross-lingually generalization capabilities of multilingual representations across 40 languages and 9 tasks.
Cross-lingual Language Model Pretraining
TLDR
This work proposes two methods to learn cross-lingual language models (XLMs): one unsupervised that only relies on monolingual data, and one supervised that leverages parallel data with a new cross-lingsual language model objective.
XGLUE: A New Benchmark Datasetfor Cross-lingual Pre-training, Understanding and Generation
TLDR
A recent cross-lingual pre-trained model Unicoder is extended to cover both understanding and generation tasks, which is evaluated on XGLUE as a strong baseline and the base versions of Multilingual BERT, XLM and XLM-R are evaluated for comparison.
Unsupervised Cross-lingual Representation Learning at Scale
TLDR
It is shown that pretraining multilingual language models at scale leads to significant performance gains for a wide range of cross-lingual transfer tasks, and the possibility of multilingual modeling without sacrificing per-language performance is shown for the first time.
TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages
TLDR
A quantitative analysis of the data quality and example-level qualitative linguistic analyses of observed language phenomena that would not be found in English-only corpora are presented.
Language Models as Knowledge Bases?
TLDR
An in-depth analysis of the relational knowledge already present (without fine-tuning) in a wide range of state-of-the-art pretrained language models finds that BERT contains relational knowledge competitive with traditional NLP methods that have some access to oracle knowledge.
...
...