Principal components of sound systems: An exercise in multivariate statistical typology

Abstract

ABSRACT: Phoneme inventories of the world’s languages as depicted by the UPSID database (Maddieson and Precoda 1990) are analyzed using multivariate statistical techniques of principal components analysis and k-means and hierarchical clustering. The first two meaningful principal components, representing dimensions that account for the most variance in sound systems but are not caused by differences in typological frequencies of phonemes, are found to separate languages into three large clusters, distinguished by glottal articulations present in the stop inventory and the sonority of other types of sounds present in the language. Clustering analyses, which automatically categorize sound systems and phonemes, are shown to reveal both areal groupings of languages, for instance, categorizing together genetically unrelated languages of India, and groupings of phonemes that are often interpretable in featural terms, especially when clustering analyses are conducted within phoneme categories defined by manner of articulation / sonority.

33 Figures and Tables

Cite this paper

@inproceedings{Kapatsinski2008PrincipalCO, title={Principal components of sound systems: An exercise in multivariate statistical typology}, author={Vsevolod Kapatsinski}, year={2008} }