Can LSTM Learn to Capture Agreement? The Case of Basque

@inproceedings{Ravfogel2018CanLL,
  title={Can LSTM Learn to Capture Agreement? The Case of Basque},
  author={Shauli Ravfogel and Francis M. Tyers and Yoav Goldberg},
  booktitle={BlackboxNLP@EMNLP},
  year={2018}
}
Sequential neural networks models are powerful tools in a variety of Natural Language Processing (NLP) tasks. [...] Key Result Tentative findings based on diagnostic classifiers suggest the network makes use of local heuristics as a proxy for the hierarchical structure of the sentence. We propose the Basque agreement prediction task as challenging benchmark for models that attempt to learn regularities in human language.Expand
Studying the Inductive Biases of RNNs with Synthetic Variations of Natural Languages
TLDR
A paradigm is proposed that creates synthetic versions of English, which differ from English in one or more typological parameters, and generates corpora for those languages based on a parsed English corpus, and finds overt morphological case makes agreement prediction significantly easier, regardless of word order. Expand
What Should/Do/Can LSTMs Learn When Parsing Auxiliary Verb Constructions?
TLDR
It is found that the parser learns different information about AVCs and FMVs if only sequential models are used in the architecture but similar information when a recursive layer is used, and there may be benefits to using aursive layer in dependency parsing. Expand
Cross-Linguistic Syntactic Evaluation of Word Prediction Models
TLDR
ClAMS (Cross-Linguistic Assessment of Models on Syntax), a syntactic evaluation suite for monolingual and multilingual models, is introduced, which uses subject-verb agreement challenge sets for English, French, German, Hebrew and Russian, generated from grammars developed. Expand
Neural network learning of the Russian genitive of negation: optionality and structure sensitivity
TLDR
This paper investigates the neural network learning of the Russian genitive of negation and finds that the recurrent neural network language model tested can learn this grammaticality pattern, although it is not clear whether it learns the locality constraint on the genitive objects. Expand
CLiMP: A Benchmark for Chinese Language Model Evaluation
TLDR
The corpus of Chinese linguistic minimal pairs (CLiMP) is introduced to investigate what knowledge Chinese LMs acquire and it is found that classifier–noun agreement and verb complement selection are the phenomena that models generally perform best at. Expand
Modeling German Verb Argument Structures: LSTMs vs. Humans
TLDR
A German grammaticality dataset in which ungrammatical sentences are constructed by manipulating case assignments (eg substituting nominative by accusative or dative) is introduced, finding that LSTMs are better than chance in detecting incorrect argument structures and slightly worse than humans tested on the same dataset. Expand
Morph Call: Probing Morphosyntactic Content of Multilingual Transformers
TLDR
Morph Call presents Morph Call, a suite of 46 probing tasks for four Indo-European languages of different morphology: Russian, French, English and German, and proposes a new type of probing tasks based on detection of guided sentence perturbations. Expand
Uncovering Constraint-Based Behavior in Neural Models via Targeted Fine-Tuning
TLDR
It is shown that competing processes in a language act as constraints on model behavior and demonstrated that targeted fine-tuning can re-weight the learned constraints, uncovering otherwise dormant linguistic knowledge in models. Expand
A Systematic Analysis of Morphological Content in BERT Models for Multiple Languages
TLDR
The experiments contained herein show that transformer architectures largely partition their embedding space into convex sub-regions highly correlated with morphological feature value, and the contextualized nature of transformer embeddings allows models to distinguish ambiguous morphological forms in many, but not all cases. Expand
Recurrent babbling: evaluating the acquisition of grammar from limited input data
TLDR
It is shown that the LSTM indeed abstracts new structures as learning proceeds, and the behaviour of the network is analysed over time using a novel methodology which consists in quantifying the level of grammatical abstraction in the model’s generated output (its ‘babbling’), compared to the language it has been exposed to. Expand
...
1
2
3
...

References

SHOWING 1-10 OF 27 REFERENCES
Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies
TLDR
It is concluded that LSTMs can capture a non-trivial amount of grammatical structure given targeted supervision, but stronger architectures may be required to further reduce errors; furthermore, the language modeling signal is insufficient for capturing syntax-sensitive dependencies, and should be supplemented with more direct supervision if such dependencies need to be captured. Expand
Using Deep Neural Networks to Learn Syntactic Agreement
TLDR
DNNs require large vocabularies to form substantive lexical embeddings in order to learn structural patterns, and this finding has interesting consequences for the understanding of the way in which DNNs represent syntactic information. Expand
Colorless green recurrent networks dream hierarchically
TLDR
Support is brought to the hypothesis that RNNs are not just shallow-pattern extractors, but they also acquire deeper grammatical competence by making reliable predictions about long-distance agreement and do not lag much behind human performance. Expand
Memory Architectures in Recurrent Neural Network Language Models
TLDR
The results demonstrate the value of stack-structured memory for explaining the distribution of words in natural language, in line with linguistic theories claiming a context-free backbone for natural language. Expand
Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks
TLDR
This work proposes a framework that facilitates better understanding of the encoded representations of sentence vectors and demonstrates the potential contribution of the approach by analyzing different sentence representation mechanisms. Expand
Deep Biaffine Attention for Neural Dependency Parsing
TLDR
This paper uses a larger but more thoroughly regularized parser than other recent BiLSTM-based approaches, with biaffine classifiers to predict arcs and labels, and shows which hyperparameter choices had a significant effect on parsing accuracy, allowing it to achieve large gains over other graph-based approach. Expand
Visualizing and Understanding Recurrent Networks
TLDR
This work uses character-level language models as an interpretable testbed to provide an analysis of LSTM representations, predictions and error types, and reveals the existence of interpretable cells that keep track of long-range dependencies such as line lengths, quotes and brackets. Expand
Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context
TLDR
This paper investigates the role of context in an LSTM LM, through ablation studies, and analyzes the increase in perplexity when prior context words are shuffled, replaced, or dropped to provide a better understanding of how neural LMs use their context. Expand
Exploring the Limits of Language Modeling
TLDR
This work explores recent advances in Recurrent Neural Networks for large scale Language Modeling, and extends current models to deal with two key challenges present in this task: corpora and vocabulary sizes, and complex, long term structure of language. Expand
On the State of the Art of Evaluation in Neural Language Models
TLDR
This work reevaluate several popular architectures and regularisation methods with large-scale automatic black-box hyperparameter tuning and arrives at the somewhat surprising conclusion that standard LSTM architectures, when properly regularised, outperform more recent models. Expand
...
1
2
3
...