Probabilistic Typology: Deep Generative Models of Vowel Inventories

@article{Cotterell2017ProbabilisticTD,
  title={Probabilistic Typology: Deep Generative Models of Vowel Inventories},
  author={Ryan Cotterell and Jason Eisner},
  journal={ArXiv},
  year={2017},
  volume={abs/1705.01684}
}
Linguistic typology studies the range of structures present in human language. The main goal of the field is to discover which sets of possible phenomena are universal, and which are merely frequent. For example, all languages have vowels, while most---but not all---languages have an /u/ sound. In this paper we present the first probabilistic treatment of a basic question in phonological typology: What makes a natural vowel inventory? We introduce a series of deep stochastic point processes… 

Figures and Tables from this paper

A Deep Generative Model of Vowel Formant Typology
TLDR
This work tackles the problem of vowel system typology, i.e., a generative probability model of which vowels a language contains, and develops a novel generative probabilities model that works directly with the acoustic information.
A Probabilistic Generative Model of Linguistic Typology
TLDR
This work develops a generative model of language based on exponential-family matrix factorisation and shows how structural similarities between languages can be exploited to predict typological features with near-perfect accuracy, outperforming several baselines on the task of predicting held-out features.
From Phonology to Syntax: Unsupervised Linguistic Typology at Different Levels with Language Embeddings
A core part of linguistic typology is the classification of languages according to linguistic properties, such as those detailed in the World Atlas of Language Structure (WALS). Doing this manually
Tracking Typological Traits of Uralic Languages in Distributed Language Representations
TLDR
This paper investigates which typological features are encoded in distributed representations of language by attempting to predict features in the World Atlas of Language Structures, and finds that some typological traits can be automatically inferred with accuracies well above a strong baseline.
On the Relation between Linguistic Typology and (Limitations of) Multilingual Language Modeling
TLDR
Fine-grained typological features such as exponence, flexivity, fusion, and inflectional synthesis are borne out to be responsible for the proliferation of low-frequency phenomena which are organically difficult to model by statistical architectures, or for the meaning ambiguity of character n-grams.
Uncovering Probabilistic Implications in Typological Knowledge Bases
TLDR
A computational model is presented which successfully identifies known universals, including Greenberg universals but also uncovers new ones, worthy of further linguistic investigation, which outperforms baselines previously used for this problem, as well as a strong baseline from knowledge base population.
Consonant co-occurrence classes and the feature-economy principle
The feature-economy principle is one of the key theoretical notions which have been postulated to account for the structure of phoneme inventories in the world's languages. In this paper, we test the
Phonotactic Complexity and Its Trade-offs
TLDR
Methods for calculating a measure of phonotactic complexity—bits per phoneme— that permits a straightforward cross-linguistic comparison are presented, giving insight into how complex a language’s phonotactics is.
Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing
TLDR
It is shown that to date, the use of information in existing typological databases has resulted in consistent but modest improvements in system performance, due to both intrinsic limitations of databases and under-employment of the typological features included in them.
Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing
TLDR
It is suggested that a new approach that adapts the broad and discrete nature of typological categories to the contextual and continuous nature of machine learning algorithms used in contemporary NLP could be facilitated by recent developments in data-driven induction ofTypological knowledge.
...
1
2
3
...

References

SHOWING 1-10 OF 39 REFERENCES
An Introduction to Linguistic Typology
This clear and accessible introduction to linguistic typology covers all linguistic domains from phonology and morphology over parts-of-speech, the NP and the VP, to simple and complex clauses,
What is Phonological Typology
UC Berkeley Phonology Lab Annual Report (2014) What is Phonological Typology? Larry M. Hyman University of California, Berkeley Paper presented at the Workshop on Phonological Typology, University of
Improved Lexical Acquisition through DPP-based Verb Clustering
TLDR
This work presents the first unified framework for unsupervised learning of subcategorization frames, selectional preferences and verb classes, and shows how to utilize Determinantal Point Processes, elegant probabilistic models that are defined over the possible subsets of a given dataset and give higher probability mass to high quality and diverse subsets, for clustering.
The Dispersion-Focalization Theory of vowel systems
The Dispersion-Focalization Theory (DFT) attempts to predict vowel systems based on the minimization of an energy function summing two perceptual components : global dispersion , which is based on
The sounds of the world's languages
List of Figures. List of Tables. Acknowledgments. 1. The Sounds of the Worlda s Languages. 2. Places of Articulation. 3. Stops. 4. Nasals and Nasalized Consonants. 5. Fricatives. 6. Laterals. 7.
A course in phonetics
Part I Introductory concepts: articulatory phonetics phonology and phonetic transcription. Part II English phonetics: the Consonants of English English vowels English words and sentences. Part III
Toward a universal law of generalization for psychological science.
A psychological space is established for any set of stimuli by determining metric distances between the stimuli such that the probability that a response learned to any stimulus will generalize to
Determinantal Point Processes for Machine Learning
TLDR
Determinantal Point Processes for Machine Learning provides a comprehensible introduction to DPPs, focusing on the intuitions, algorithms, and extensions that are most relevant to the machine learning community, and shows how they can be applied to real-world applications.
Learning Determinantal Point Processes
TLDR
This thesis shows how determinantal point processes can be used as probabilistic models for binary structured problems characterized by global, negative interactions, and demonstrates experimentally that the techniques introduced allow DPPs to be used for real-world tasks like document summarization, multiple human pose estimation, search diversification, and the threading of large document collections.
A Learning Algorithm for Boltzmann Machines
TLDR
A general parallel search method is described, based on statistical mechanics, and it is shown how it leads to a general learning rule for modifying the connection strengths so as to incorporate knowledge about a task domain in an efficient way.
...
1
2
3
4
...