Robust Natural Language Processing: Recent Advances, Challenges, and Future Directions

  title={Robust Natural Language Processing: Recent Advances, Challenges, and Future Directions},
  author={Marwan Omar and Soohyeon Choi and Daehun Nyang and David A. Mohaisen},
  journal={IEEE Access},
Recent natural language processing (NLP) techniques have accomplished high performance on benchmark data sets, primarily due to the significant improvement in the performance of deep learning. The advances in the research community have led to great enhancements in state-of-the-art production systems for NLP tasks, such as virtual assistants, speech recognition, and sentiment analysis. However, such NLP systems still often fail when tested with adversarial attacks. The initial lack of… 

Figures and Tables from this paper

Quantifying the Performance of Adversarial Training on Language Models with Distribution Shifts

This paper examines the limitations of adversarial training due to the temporal changes of machine learning models using a natural language task and shows that certain adversarially-trained models are even more prone to the drift than others.

How Robust is your Fair Model? Exploring the Robustness of Diverse Fairness Strategies

To the best of the knowledge, this work is the first to quantitatively evaluate the robustness of fairness optimisation strategies, and can potentially serve as a guideline in choosing the most suitable fairness strategy for various data sets.

Examining the Characteristics of Practical Knowledge From Four Public Facebook Communities of Practice in Instructional Design and Technology

The study highlights the need for pedagogical foundations to support instructional and technical decisions, mechanisms for self-assessment of practical knowledge concerning IDT competencies, community protocols for addressing misconceptions about learning, onboarding materials for new members, and new topic structures to classify practical knowledge.



Models in the Wild: On Corruption Robustness of Neural NLP Systems

This paper introduces WildNLP - a framework for testing model stability in a natural setting where text corruptions such as keyboard errors or misspelling occur, and compares robustness of deep learning models from 4 popular NLP tasks by testing their performance on aspects introduced in the framework.

Empirical evaluation of multi-task learning in deep neural networks for natural language processing

This paper conducts a thorough examination of five typical MTL methods with deep learning architectures for a broad range of representative NLP tasks to understand the merits and demerits of existing M TL methods in NLP task, thus devising new hybrid architectures intended to combine their strengths.

Adversarial Attacks on Deep-learning Models in Natural Language Processing

A systematic survey on preliminary knowledge of NLP and related seminal works in computer vision is presented, which collects all related academic works since the first appearance in 2017 and analyzes 40 representative works in a comprehensive way.

Towards a Robust Deep Neural Network in Texts: A Survey

A taxonomy of adversarial attacks and defenses in texts from the perspective of different natural language processing (NLP) tasks is given, and how to build a robust DNN model via testing and verification is introduced.

Evaluating the Robustness of Neural Language Models to Input Perturbations

This study designs and implements various types of character-level and word-level perturbation methods to simulate realistic scenarios in which input texts may be slightly noisy or different from the data distribution on which NLP systems were trained.

Defense against Adversarial Attacks in NLP via Dirichlet Neighborhood Ensemble

Dirichlet Neighborhood Ensemble is proposed, a randomized smoothing method for training a robust model to defense substitution-based attacks that consistently outperforms recently proposed defense methods by a significant margin across different network architectures and multiple data sets.

A Survey of Data Augmentation Approaches for NLP

This paper introduces and motivate data augmentation for NLP, and then discusses major methodologically representative approaches, and highlights techniques that are used for popular NLP applications and tasks.

InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective

Large-scale language models such as BERT have achieved state-of-the-art performance across a wide range of NLP tasks. Recent studies, however, show that such BERT-based models are vulnerable facing

Interpreting the Robustness of Neural NLP Models to Textual Perturbations

This work conducts extensive experiments with four prominent NLP models — TextRNN, BERT, RoBERTa and XLNet — over eight types of textual perturbations on three datasets, showing that a model which is better at identifying a perturbation becomes worse at ignoring such a perturgation at test time (lower robustness), providing empirical support for the hypothesis.

CAT-Gen: Improving Robustness in NLP Models via Controlled Adversarial Text Generation

This work presents a Controlled Adversarial Text Generation (CAT-Gen) model that, given an input text, generates adversarial texts through controllable attributes that are known to be invariant to task labels.