Automated Dating of the World’s Language Families Based on Lexical Similarity

  title={Automated Dating of the World’s Language Families Based on Lexical Similarity},
  author={Eric W. Holman and Cecil H. Brown and S{\o}ren Wichmann and Andreas Muller and Viveka Velupillai and Harald Hammarstr{\"o}m and Sebastian Sauppe and Hoe-Chun Jung and Roderick J Bakker and Patrick H. Brown and Orlin Belyaev and Matthias Urban and Robert Mailhammer and Johann-Mattis List and D. B. Egorov},
  journal={Current Anthropology},
  pages={841 - 875}
This paper describes a computerized alternative to glottochronology for estimating elapsed time since parent languages diverged into daughter languages. The method, developed by the Automated Similarity Judgment Program (ASJP) consortium, is different from glottochronology in four major respects: (1) it is automated and thus is more objective, (2) it applies a uniform analytical approach to a single database of worldwide languages, (3) it is based on lexical similarity as determined from… 

Phonotactic Diversity Predicts the Time Depth of the World’s Language Families

A new automated dating method, based on phonotactic diversity, which does not require any information on the internal classification of a language group and can use all the available word lists for a language and its dialects eschewing the debate on ‘language’ vs. ‘dialect’.

Automated methods for the investigation of language contact, with a focus on lexical borrowing

This study provides a concise introduction to the most important approaches to lexical borrowing, presenting methods that use phylogenetic networks to detect reticulation events during language history, sequence comparison methods in order to identify borrowings in multilingual datasets, and arguments for the borrowability of shared traits to decide if traits have been borrowed or inherited.

Correlates of reticulation in linguistic phylogenies

The interpretation is that δ is a realistic measure of reticulation and sensitive to effects of socio-historical events such as language extinction.

Endangered language families

Linguists have increased their documentation efforts in response to the sharp decline in the number of languages. Greater awareness and new sources of funding have led to an upsurge in language

Towards identifying the optimal datasize for lexically-based Bayesian inference of linguistic phylogenies

The optimal number of meanings required for the best performance in Bayesian phylogenetic inference appears to correlate with the number of languages under consideration, and the results of the two methods vary across families.

The Potential of Automatic Word Comparison for Historical Linguistics

Test the potential of automatic methods to detect etymologically related words (cognates) in cross-linguistic data and identifies the specific strengths and weaknesses of these different methods.

Testing methods of linguistic homeland detection using synthetic data

This work carries out performance testing by simulating language families, including branching structures and word lists, along with speaker populations moving in space, and proposes a hierarchy of performance of the different methods.

Lexibank, a public repository of standardized wordlists with computed phonological and lexical features

A new approach to increase the comparability of cross-linguistic lexical data by designing workflows for the computer-assisted lifting of datasets to Cross-Linguistic Data Formats, a collection of standards that make these datasets more Findable, Accessible, Interoperable, and Reusable (FAIR).

Linked Data for Linguistic Diversity Research: Glottolog/Langdoc and ASJP Online

  • S. Nordhoff
  • Linguistics, Computer Science
    Linked Data in Linguistics
  • 2012
These two projects are the first attempt at a Typological Linked Data Cloud, to which PHOIBLE by other resources can easily be added in the future.

Sequence comparison in computational historical linguistics

This tutorial will briefly introduce the basic concepts behind the algorithms employed by LingPy, and illustrate in concrete workflows how automatic sequence comparison can be applied to multi-lingual word lists.



Automated classification of the world′s languages: a description of the method and preliminary results

Abstract An approach to the classification of languages through automated lexical comparison is described. This method produces near-expert classifications. At the core of the approach is the

Indo-European languages tree by Levenshtein distance

This work introduces a genetic distance among language pairs by considering a renormalized Levenshtein distance among words with same meaning and averaging on all words contained in a Swadesh list and finds out a tree which closely resembles the one published in Gray and Atkinson (2003), with some significant differences.

Explorations in automated language classification

A refinement of the method for automating language classification based on the 100-item referent list of Swadesh, involving calculation of relative stabilities of list items and reduction of the list to a shorter one by eliminating least stable items is discussed.

Continuity and divergence in the Bantu languages : perspectives from a lexicostatistic study

When a group of rather closely related languages such as Bantu covers a large contiguous area, a genetic tree can help tracing the history of the region. Nurse provides a good overview of the history

On the Accuracy of Language Trees

A thorough survey of the accuracy of reconstruction methods as compared with the Ethnologue expert classifications is conducted, focusing in particular on state-of-the-art distance-based methods for phylogeny reconstruction using worldwide linguistic databases.

Linguistic Divergence in Romance

A method of quantifying judgments of relative 'closeness' or 'distance' between related languages, and some results of its application are given.

Positing Language Relationships Using ALINE

This paper generates trees from distance matrices created by the language distance metrics using two different algorithms developed by computational biologists: Neighbor Joining and UPGMA, and compares them with expert trees based on those compiled by the Ethnologue project.

How Accurate and Robust Are the Phylogenetic Estimates of Austronesian Language Relationships?

The results show that the Austronesian language phylogenies are highly congruent with the traditional subgroupings, and the date estimates are robust even when calculated using a restricted set of historical calibrations.

Austronesian language phylogenies: myths and misconceptions about Bayesian computational methods

Phylogenetic analyses of structural features have revealed historical signals in Papuan and reflected a settlement pattern through Island South-East Asia, New Guinea and then into Oceania, consistent with the ‘Out of Taiwan’ scenario.