• Corpus ID: 1512061

Adversarial Evaluation for Models of Natural Language

  title={Adversarial Evaluation for Models of Natural Language},
  author={Noah A. Smith},
We now have a rich and growing set of modeling tools and algorithms for inducing linguistic structure from text that is less than fully annotated. In this paper, we discuss some of the weaknesses of our current methodology. We present a new abstract framework for evaluating natural language processing (NLP) models in general and unsupervised NLP models in particular. The central idea is to make explicit certain adversarial roles among researchers, so that the different roles in an evaluation… 

Figures and Tables from this paper

Stress Test Evaluation of Transformer-based Models in Natural Language Understanding Tasks
Evaluated Transformer-based models in Natural Language Inference and Question Answering tasks reveal that RoBERTa, XLNet and BERT are more robust than recurrent neural network models to stress tests for both NLI and QA tasks, revealing that there is still room for future improvement in this field.
Iterated learning framework for unsupervised part-of-speech induction
This thesis presents a generative Bayesian system that makes it easy to incorporate multiple diverse features, spanning different levels of linguistic structure, like morphology, lexical distribution, syntactic dependencies and word alignment information that allow for the examination of cross-linguistic patterns.
Stress Test Evaluation for Natural Language Inference
This work proposes an evaluation methodology consisting of automatically constructed “stress tests” that allow us to examine whether systems have the ability to make real inferential decisions, and reveals strengths and weaknesses of these models with respect to challenging linguistic phenomena.
Adversarial Examples for Evaluating Reading Comprehension Systems
This work proposes an adversarial evaluation scheme for the Stanford Question Answering Dataset that tests whether systems can answer questions about paragraphs that contain adversarially inserted sentences without changing the correct answer or misleading humans.
NoiseQA: Challenge Set Evaluation for User-Centric Question Answering
There is substantial room for progress before QA systems can be effectively deployed, the need for QA evaluation to expand to consider real-world use is highlighted, and the findings will spur greater community interest in the issues that arise when the authors' systems actually need to be of utility to humans.
Robust Training under Linguistic Adversity
This work proposes a linguistically-motivated approach for training robust models based on exposing the model to corrupted text examples at training time, considering several flavours of linguistically plausible corruption, include lexical semantic and syntactic methods.
Adversarial Stylometry in the Wild: Transferable Lexical Substitution Attacks on Author Profiling
A transformer-based extension of a lexical replacement attack achieves high transferability when trained on a weakly labeled corpus—decreasing target model performance below chance and provides a promising direction for future privacy-preserving adversarial attacks.
The Enemy in Your Own Camp: How Well Can We Detect Statistically-Generated Fake Reviews – An Adversarial Study
It is found that meta-information helps detection, but that NLP-generated reviews conditioned on such information are also much harder to detect than conventional ones.
Embedding for Evaluation of Topic Modeling Unsupervised Algorithms
The Word Embedding Topic Evaluation methodology will help in identifying the efficient outcomes with better accuracy and outperforms existing document models that are generally used in measuring topic evaluation such as coherence score, perplexity etc., in terms of topic quality and predictive performance.
Combination of Language Models for Word Prediction: An Exponential Approach
This paper proposes an exponential interpolation to merge a part-of-speech-based language model and a word-based <inline-formula><tex-math notation="LaTeX">$n$</tex-math></inline-formula>-gram


Guiding Unsupervised Grammar Induction Using Contrastive Estimation
It is shown that, using the same features, log-linear dependency grammar models trained using CE can drastically outperform EMtrained generative models on the task of matching human linguistic annotations (the MATCHLINGUIST task).
Reading Tea Leaves: How Humans Interpret Topic Models
New quantitative methods for measuring semantic meaning in inferred topics are presented, showing that they capture aspects of the model that are undetected by previous measures of model quality based on held-out likelihood.
CoNLL-X Shared Task on Multilingual Dependency Parsing
How treebanks for 13 languages were converted into the same dependency format and how parsing performance was measured is described and general conclusions about multi-lingual parsing are drawn.
Machine Learning that Matters
This work presents six Impact Challenges to explicitly focus the field of machine learning's energy and attention, and discusses existing obstacles that must be addressed.
Latent Dirichlet Allocation
Large Language Models in Machine Translation
Systems, methods, and computer program products for machine translation are provided for backoff score determination as a function of a backoff factor and a relative frequency of a corresponding backoff n-gram in the corpus.
The Omphalos Context-Free Grammar Learning Competition
The Omphalos Context-Free Grammar Learning Competition held as part of the International Colloquium on Grammatical Inference 2004 is described, including a new measure of the complexity of inferring context-free grammars, used to rank the competition problems.
Predicting Risk from Financial Reports with Regression
This work applies well-known regression techniques to a large corpus of freely available financial reports, constructing regression models of volatility for the period following a report, rivaling past volatility in predicting the target variable.
Relations among Notions of Security for Public-Key Encryption Schemes
The goals of privacy and non-malleability are considered, each under chosen plaintext attack and two kinds of chosen ciphertext attack, and a new definition of non-Malleability is proposed which the author believes is simpler than the previous one.
Why do Nigerian Scammers Say They are From Nigeria?
It is shown that as victim density decreases the fraction of viable users than can be profitably attacked drops dramatically, and suggests that only by finding large numbers of victims can he learn how to accurately distinguish the two.