Neural Network Acceptability Judgments

@article{Warstadt2018NeuralNA,
  title={Neural Network Acceptability Judgments},
  author={Alex Warstadt and Amanpreet Singh and Samuel R. Bowman},
  journal={Transactions of the Association for Computational Linguistics},
  year={2018},
  volume={7},
  pages={625-641}
}
This paper investigates the ability of artificial neural networks to judge the grammatical acceptability of a sentence, with the goal of testing their linguistic competence. [] Key Method As baselines, we train several recurrent neural network models on acceptability classification, and find that our models outperform unsupervised models by Lau et al. (2016) on CoLA. Error-analysis on specific grammatical phenomena reveals that both Lau et al.’s models and ours learn systematic generalizations like subject…

An LSTM Adaptation Study of (Un)grammaticality

The results show that both in the difficult and highly symmetrical task of detecting subject islands and in the more open CoLA dataset, grammatical sentences give rise to better scores than ungrammatical ones, possibly because they can be better integrated within the body of linguistic structural knowledge that the language model has accumulated.

Grammatical Analysis of Pretrained Sentence Encoders with Acceptability Judgments

A grammatically annotated development set for the Corpus of Linguistic Acceptability (CoLA; Warstadt et al., 2018) is introduced, which is used to investigate the grammatical knowledge of three pretrained encoders, including the popular OpenAI Transformer and BERT.

Word Frequency Does Not Predict Grammatical Knowledge in Language Models

Focusing on subject-verb agreement and reflexive anaphora, it is found that certain nouns are systematically understood better than others, an effect which is robust across grammatical tasks and different language models.

Using Integrated Gradients and Constituency Parse Trees to explain Linguistic Acceptability learnt by BERT

The decision-making process of BERT in distinguishing between Linguistic Acceptable sentences (LA) and Linguistically Unacceptable sentences (LUA) is understood and Layer Integrated Gradients Attribution Scores (LIG) are leveraged to explain the L linguistic Acceptability criteria that are learnt by BERT on the Corpus of Linguists Acceptability (CoLA) (Warstadt et al., 2018).

Grammaticality and Language Modelling

Some recent work in syntactically targeted linguistic evaluations are reappraised, arguing that while their experimental design sets a new high watermark for this topic, their results may not prove what they have claimed.

Linguistic Analysis of Pretrained Sentence Encoders with Acceptability Judgments

It is concluded that recent sentence encoders, despite showing near-human performance on acceptability classification overall, still fail to make fine-grained grammaticality distinctions for many complex syntactic structures.

Investigating Representations of Verb Bias in Neural Language Models

DAIS, a large benchmark dataset containing 50K human judgments for 5K distinct sentence pairs in the English dative alternation, is introduced, showing that larger models perform better than smaller models, and transformer architectures tend to out-perform recurrent architectures even under comparable parameter and training settings.

Targeted Syntactic Evaluation of Language Models

In an experiment using this data set, an LSTM language model performed poorly on many of the constructions, and a large gap remained between its performance and the accuracy of human participants recruited online.

A Systematic Assessment of Syntactic Generalization in Neural Language Models

A systematic evaluation of the syntactic knowledge of neural language models, testing 20 combinations of model types and data sizes on a set of 34 English-language syntactic test suites finds substantial differences in syntactic generalization performance by model architecture.

Cross-Linguistic Syntactic Evaluation of Word Prediction Models

ClAMS (Cross-Linguistic Assessment of Models on Syntax), a syntactic evaluation suite for monolingual and multilingual models, is introduced, which uses subject-verb agreement challenge sets for English, French, German, Hebrew and Russian, generated from grammars developed.
...

References

SHOWING 1-10 OF 69 REFERENCES

RNNs as psycholinguistic subjects: Syntactic state and grammatical dependency

It is demonstrated that these models represent and maintain incremental syntactic state, but that they do not always generalize in the same way as humans.

Grammatical Analysis of Pretrained Sentence Encoders with Acceptability Judgments

A grammatically annotated development set for the Corpus of Linguistic Acceptability (CoLA; Warstadt et al., 2018) is introduced, which is used to investigate the grammatical knowledge of three pretrained encoders, including the popular OpenAI Transformer and BERT.

Targeted Syntactic Evaluation of Language Models

In an experiment using this data set, an LSTM language model performed poorly on many of the constructions, and a large gap remained between its performance and the accuracy of human participants recruited online.

Grammaticality, Acceptability, and Probability: A Probabilistic View of Linguistic Knowledge

It is argued that the results of a set of large-scale experiments using crowd-sourced acceptability judgments that demonstrate gradience to be a pervasive feature inacceptability judgments support the view that linguistic knowledge can be intrinsically probabilistic.

Judging Grammaticality: Experiments in Sentence Classification

It is demonstrated that the combination of information from a variety of linguistic sources is helpful, the trade-off between accuracy on well formed sentences and accuracy on ill formed sentences can be fine tuned by training multiple classifiers in a voting scheme.

Natural Language Grammatical Inference with Recurrent Neural Networks

It was found that certain architectures are better able to learn an appropriate grammar than others, and the extraction of rules in the form of deterministic finite state automata is investigated.

Predicting Grammaticality on an Ordinal Scale

This work constructs a statistical model of grammaticality using various linguistic features (e.g., misspelling counts, parser outputs, n-gram language model scores) and presents a new publicly available dataset of learner sentences judged for grammaticaly on an ordinal scale.

Structural Supervision Improves Learning of Non-Local Grammatical Dependencies

It is found that the RNNG outperforms the LSTM on both types of grammatical dependencies and even learns many of the Island Constraints on the filler-gap dependency, which provides data efficiency advantages over purely string-based training of neural language models in acquiring human-like generalizations about non-local grammatical Dependencies.

Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies

It is concluded that LSTMs can capture a non-trivial amount of grammatical structure given targeted supervision, but stronger architectures may be required to further reduce errors; furthermore, the language modeling signal is insufficient for capturing syntax-sensitive dependencies, and should be supplemented with more direct supervision if such dependencies need to be captured.

SNAP judgments: A small N acceptability paradigm (SNAP) for linguistic acceptability judgments

Abstract:While published linguistic judgments sometimes differ from the judgments found in large-scale formal experiments with naive participants, there is not a consensus as to how often these
...