[langev] Dan Klein

Dan Klein

2013

Automated reconstruction of ancient languages using probabilistic models of sound changedoi.org PDF

A Bouchard-Cote, D Hall, TL Griffiths, D Klein

PNAS 110(11):4224-4229, 2013

One of the oldest problems in linguistics is reconstructing the words that appeared in the protolanguages from which modern languages evolved. Identifying the forms of these ancient languages makes it possible to evaluate proposals about the nature of language change and to draw inferences about human history. Protolanguages are typically reconstructed using a painstaking manual process known as the comparative method. We present a family of probabilistic models of sound change as well as algorithms for performing inference in these models. The resulting system automatically and accurately reconstructs protolanguages from modern languages. We apply this system to 637 Austronesian languages, providing an accurate, large-scale automatic reconstruction of a set of protolanguages. Over 85% of the system’s reconstructions are within one character of the manual reconstruction provided by a linguist specializing in Austronesian languages. Being able to automatically reconstruct large numbers of languages provides a useful way to quantitatively explore hypotheses about the factors determining which sounds in a language are likely to change over time. We demonstrate this by showing that the reconstructed Austronesian protolanguages provide compelling support for a hypothesis about the relationship between the function of a sound and its probability of changing that was first proposed in 1955.

Cited by 28 in Semantic Scholar | Search Google Scholar

2009

Convergence Bounds for Language Evolution by Iterated LearningPDF

A Rafferty, TL Griffiths, D Klein

Proceedings of the 31st Annual Conference of the Cognitive Science Society, 2009

Similarities between human languages are often taken as evidence of constraints on language learning. However, such similarities could also be the result of descent from a common ancestor. In the framework of iterated learning, language evolution converges to an equilibrium that is independent of its starting point, with the effect of shared ancestry decaying over time. Therefore, the central question is the rate of this convergence, which we formally analyze here. We show that convergence occurs in a number of generations that is O(n log n) for Bayesian learning of the ranking of n constraints or the values of n binary parameters. We also present simulations confirming this result and indicating how convergence is affected by the entropy of the prior distribution over languages.

Cited by 7 in Semantic Scholar | Search Google Scholar

Language Evolution and Computation Bibliography