A Probabilistic Generative Model of Linguistic Typology
@article{Bjerva2019APG, title={A Probabilistic Generative Model of Linguistic Typology}, author={Johannes Bjerva and Yova Kementchedjhieva and Ryan Cotterell and Isabelle Augenstein}, journal={ArXiv}, year={2019}, volume={abs/1903.10950} }
In the principles-and-parameters framework, the structural features of languages depend on parameters that may be toggled on or off, with a single parameter often dictating the status of multiple features. [] Key Result This finding has clear practical and also theoretical implications: the results confirm what linguists have hypothesised, i.e.~that there are significant correlations between typological features and languages.
Figures and Tables from this paper
19 Citations
Uncovering Probabilistic Implications in Typological Knowledge Bases
- Linguistics, Computer ScienceACL
- 2019
A computational model is presented which successfully identifies known universals, including Greenberg universals but also uncovers new ones, worthy of further linguistic investigation, which outperforms baselines previously used for this problem, as well as a strong baseline from knowledge base population.
SIGTYP 2020 Shared Task: Prediction of Typological Features
- Computer Science, LinguisticsSIGTYP
- 2020
It is revealed that even the strongest submitted systems struggle with predicting feature values for languages where few features are known, and the most successful methods make use of such feature correlations.
Bridging Linguistic Typology and Multilingual Machine Translation with Multi-view Language Representations
- Computer Science, LinguisticsEMNLP
- 2020
By inferring typological features and language phylogenies, the method can easily project and assess new languages without expensive retraining of massive multilingual or ranking models, which are major disadvantages of related approaches.
NEMO: Frequentist Inference Approach to Constrained Linguistic Typology Feature Prediction in SIGTYP 2020 Shared Task
- LinguisticsSIGTYP
- 2020
This paper describes the NEMO submission to SIGTYP 2020 shared task (Bjerva et al., 2020) which deals with prediction of linguistic typological features for multiple languages using the data derived…
Does Typological Blinding Impede Cross-Lingual Sharing?
- Computer ScienceEACL
- 2021
This model is based on a cross-lingual architecture in which the latent weights governing the sharing between languages is learnt during training, and it is shown that preventing this model from exploiting typology severely reduces performance, while a control experiment reaffirms that encouraging sharing according to typology somewhat improves performance.
Language Embeddings for Typology and Cross-lingual Transfer Learning
- Computer Science, LinguisticsACL
- 2021
This work generates dense embeddings for 29 languages using a denoising autoencoder, and evaluates the embedDings using the World Atlas of Language Structures (WALS) and two extrinsic tasks in a zero-shot setting: cross-lingual dependency parsing and cross-lingsual natural language inference.
Inducing Language-Agnostic Multilingual Representations
- Computer Science, LinguisticsSTARSEM
- 2021
Three approaches for removing language identity signals from multilingual embeddings are examined: re-aligning the vector spaces of target languages (all together) to a pivot source language, removing language-specific means and variances, and increasing input similarity across languages by removing morphological contractions and sentence reordering.
Towards a Multi-view Language Representation: A Shared Space of Discrete and Continuous Language Features
- Computer Science
- 2019
This work compute a shared space between discrete (binary) and continuous features using canonical correlation analysis and evaluates the new language representation against a concatenation baseline in typological feature prediction and in phylogenetic inference, obtaining promising results to explore further.
Stop the Morphological Cycle, I Want to Get Off: Modeling the Development of Fusion
- LinguisticsSCIL
- 2020
In simulations using artificial data, this work provides quantitative support to two claims about agglutinative and fusional structures: that optional morphological markers discourage fusion from developing, but that stressbased vowel reduction encourages it.
Zero-Shot Cross-Lingual Transfer with Meta Learning
- Computer ScienceEMNLP
- 2020
This work considers the setting of training models on multiple different languages at the same time, when little or no data is available for languages other than English, and demonstrates the consistent effectiveness of meta-learning for a total of 15 languages.
References
SHOWING 1-10 OF 46 REFERENCES
Diachrony-aware Induction of Binary Latent Representations from Typological Features
- Computer ScienceIJCNLP
- 2017
A Bayesian model is proposed that represents each language as a sequence of binary latent parameters encoding inter-feature dependencies and relates a language’s parameters to those of its phylogenetic and spatial neighbors and shows that the proposed model recovers missing values more accurately than others and that induced representations retain phylogenetics and spatial signals observed for surface features.
From Phonology to Syntax: Unsupervised Linguistic Typology at Different Levels with Language Embeddings
- LinguisticsNAACL-HLT
- 2018
A core part of linguistic typology is the classification of languages according to linguistic properties, such as those detailed in the World Atlas of Language Structure (WALS). Doing this manually…
Probabilistic Typology: Deep Generative Models of Vowel Inventories
- Linguistics, Computer ScienceACL
- 2017
Linguistic typology studies the range of structures present in human language. The main goal of the field is to discover which sets of possible phenomena are universal, and which are merely frequent.…
Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing
- Computer Science, LinguisticsComputational Linguistics
- 2018
It is shown that to date, the use of information in existing typological databases has resulted in consistent but modest improvements in system performance, due to both intrinsic limitations of databases and under-employment of the typological features included in them.
Learning Language Representations for Typology Prediction
- Computer Science, LinguisticsEMNLP
- 2017
Experiments show that the proposed method is able to infer not only syntactic, but also phonological and phonetic inventory features, and improves over a baseline that has access to information about the languages geographic and phylogenetic neighbors.
Parametric versus functional explanations of syntactic universals
- Linguistics
- 2008
This paper compares the generative principles-and-parameters approach to explaining syntactic universals to the functional-typological approach and also discusses the intermediate approach of…
Past, Present, Future: A Computational Investigation of the Typology of Tense in 1000 Languages
- Linguistics, Computer ScienceEMNLP
- 2017
SuperPivot is presented, an analysis method for low-resource languages that occur in a superparallel corpus, i.e., in a corpus that contains an order of magnitude more languages than parallel corpora currently in use, and performs well for the crosslingual analysis of the linguistic phenomenon of tense.
Semantic Drift in Multilingual Representations
- Linguistics, Computer ScienceComputational Linguistics
- 2019
Results indicate that multilingual distributional representations that are only trained on monolingual text and bilingual dictionaries preserve relations between languages without the need for any etymological information.
What Do Language Representations Really Represent?
- Linguistics, Computer ScienceComputational Linguistics
- 2019
This work investigates correlations and causal relationships between language representations learned from translations, and genetic, geographical, and several levels of structural similarity between languages on the other, finding structural similarity is found to correlate most strongly with language representation similarity.
Polyglot Neural Language Models: A Case Study in Cross-Lingual Phonetic Representation Learning
- Computer ScienceNAACL
- 2016
We introduce polyglot language models, recurrent neural network models trained to predict symbol sequences in many different languages using shared representations of symbols and conditioning on…