Adversarial Evaluation for Models of Natural Language
@article{Smith2012AdversarialEF, title={Adversarial Evaluation for Models of Natural Language}, author={Noah A. Smith}, journal={ArXiv}, year={2012}, volume={abs/1207.0245} }
We now have a rich and growing set of modeling tools and algorithms for inducing linguistic structure from text that is less than fully annotated. In this paper, we discuss some of the weaknesses of our current methodology. We present a new abstract framework for evaluating natural language processing (NLP) models in general and unsupervised NLP models in particular. The central idea is to make explicit certain adversarial roles among researchers, so that the different roles in an evaluation…
18 Citations
Stress Test Evaluation of Transformer-based Models in Natural Language Understanding Tasks
- Computer ScienceLREC
- 2020
Evaluated Transformer-based models in Natural Language Inference and Question Answering tasks reveal that RoBERTa, XLNet and BERT are more robust than recurrent neural network models to stress tests for both NLI and QA tasks, revealing that there is still room for future improvement in this field.
Iterated learning framework for unsupervised part-of-speech induction
- Computer Science
- 2013
This thesis presents a generative Bayesian system that makes it easy to incorporate multiple diverse features, spanning different levels of linguistic structure, like morphology, lexical distribution, syntactic dependencies and word alignment information that allow for the examination of cross-linguistic patterns.
Stress Test Evaluation for Natural Language Inference
- Computer ScienceCOLING
- 2018
This work proposes an evaluation methodology consisting of automatically constructed “stress tests” that allow us to examine whether systems have the ability to make real inferential decisions, and reveals strengths and weaknesses of these models with respect to challenging linguistic phenomena.
Adversarial Examples for Evaluating Reading Comprehension Systems
- Computer ScienceEMNLP
- 2017
This work proposes an adversarial evaluation scheme for the Stanford Question Answering Dataset that tests whether systems can answer questions about paragraphs that contain adversarially inserted sentences without changing the correct answer or misleading humans.
NoiseQA: Challenge Set Evaluation for User-Centric Question Answering
- Computer ScienceEACL
- 2021
There is substantial room for progress before QA systems can be effectively deployed, the need for QA evaluation to expand to consider real-world use is highlighted, and the findings will spur greater community interest in the issues that arise when the authors' systems actually need to be of utility to humans.
Robust Training under Linguistic Adversity
- Computer ScienceEACL
- 2017
This work proposes a linguistically-motivated approach for training robust models based on exposing the model to corrupted text examples at training time, considering several flavours of linguistically plausible corruption, include lexical semantic and syntactic methods.
Adversarial Stylometry in the Wild: Transferable Lexical Substitution Attacks on Author Profiling
- Computer ScienceEACL
- 2021
A transformer-based extension of a lexical replacement attack achieves high transferability when trained on a weakly labeled corpus—decreasing target model performance below chance and provides a promising direction for future privacy-preserving adversarial attacks.
The Enemy in Your Own Camp: How Well Can We Detect Statistically-Generated Fake Reviews – An Adversarial Study
- Computer ScienceACL
- 2016
It is found that meta-information helps detection, but that NLP-generated reviews conditioned on such information are also much harder to detect than conventional ones.
Embedding for Evaluation of Topic Modeling Unsupervised Algorithms
- Computer Science
- 2022
The Word Embedding Topic Evaluation methodology will help in identifying the efficient outcomes with better accuracy and outperforms existing document models that are generally used in measuring topic evaluation such as coherence score, perplexity etc., in terms of topic quality and predictive performance.
Combination of Language Models for Word Prediction: An Exponential Approach
- Computer ScienceIEEE/ACM Transactions on Audio, Speech, and Language Processing
- 2016
This paper proposes an exponential interpolation to merge a part-of-speech-based language model and a word-based <inline-formula><tex-math notation="LaTeX">$n$</tex-math></inline-formula>-gram…
References
SHOWING 1-10 OF 13 REFERENCES
Guiding Unsupervised Grammar Induction Using Contrastive Estimation
- Computer Science
- 2005
It is shown that, using the same features, log-linear dependency grammar models trained using CE can drastically outperform EMtrained generative models on the task of matching human linguistic annotations (the MATCHLINGUIST task).
Reading Tea Leaves: How Humans Interpret Topic Models
- Computer ScienceNIPS
- 2009
New quantitative methods for measuring semantic meaning in inferred topics are presented, showing that they capture aspects of the model that are undetected by previous measures of model quality based on held-out likelihood.
CoNLL-X Shared Task on Multilingual Dependency Parsing
- Computer ScienceCoNLL
- 2006
How treebanks for 13 languages were converted into the same dependency format and how parsing performance was measured is described and general conclusions about multi-lingual parsing are drawn.
Machine Learning that Matters
- Computer ScienceICML
- 2012
This work presents six Impact Challenges to explicitly focus the field of machine learning's energy and attention, and discusses existing obstacles that must be addressed.
Large Language Models in Machine Translation
- Computer ScienceEMNLP
- 2007
Systems, methods, and computer program products for machine translation are provided for backoff score determination as a function of a backoff factor and a relative frequency of a corresponding backoff n-gram in the corpus.
The Omphalos Context-Free Grammar Learning Competition
- Computer ScienceICGI
- 2004
The Omphalos Context-Free Grammar Learning Competition held as part of the International Colloquium on Grammatical Inference 2004 is described, including a new measure of the complexity of inferring context-free grammars, used to rank the competition problems.
Predicting Risk from Financial Reports with Regression
- EconomicsNAACL
- 2009
This work applies well-known regression techniques to a large corpus of freely available financial reports, constructing regression models of volatility for the period following a report, rivaling past volatility in predicting the target variable.
Relations among Notions of Security for Public-Key Encryption Schemes
- Computer Science, MathematicsIACR Cryptol. ePrint Arch.
- 1998
The goals of privacy and non-malleability are considered, each under chosen plaintext attack and two kinds of chosen ciphertext attack, and a new definition of non-Malleability is proposed which the author believes is simpler than the previous one.
Why do Nigerian Scammers Say They are From Nigeria?
- Computer ScienceWEIS
- 2012
It is shown that as victim density decreases the fraction of viable users than can be profitably attacked drops dramatically, and suggests that only by finding large numbers of victims can he learn how to accurately distinguish the two.