Language Evolution and Computation Bibliography

Our site (www.isrl.uiuc.edu/amag/langev) retired, please use https://langev.com instead.
Dan Klein
2013
PNAS 110(11):4224-4229, 2013
One of the oldest problems in linguistics is reconstructing the words that appeared in the protolanguages from which modern languages evolved. Identifying the forms of these ancient languages makes it possible to evaluate proposals about the nature of language change and to draw ...MORE ⇓
One of the oldest problems in linguistics is reconstructing the words that appeared in the protolanguages from which modern languages evolved. Identifying the forms of these ancient languages makes it possible to evaluate proposals about the nature of language change and to draw inferences about human history. Protolanguages are typically reconstructed using a painstaking manual process known as the comparative method. We present a family of probabilistic models of sound change as well as algorithms for performing inference in these models. The resulting system automatically and accurately reconstructs protolanguages from modern languages. We apply this system to 637 Austronesian languages, providing an accurate, large-scale automatic reconstruction of a set of protolanguages. Over 85% of the system’s reconstructions are within one character of the manual reconstruction provided by a linguist specializing in Austronesian languages. Being able to automatically reconstruct large numbers of languages provides a useful way to quantitatively explore hypotheses about the factors determining which sounds in a language are likely to change over time. We demonstrate this by showing that the reconstructed Austronesian protolanguages provide compelling support for a hypothesis about the relationship between the function of a sound and its probability of changing that was first proposed in 1955.
2009
Convergence Bounds for Language Evolution by Iterated LearningPDF
Proceedings of the 31st Annual Conference of the Cognitive Science Society, 2009
Similarities between human languages are often taken as evidence of constraints on language learning. However, such similarities could also be the result of descent from a common ancestor. In the framework of iterated learning, language evolution converges to an equilibrium that ...MORE ⇓
Similarities between human languages are often taken as evidence of constraints on language learning. However, such similarities could also be the result of descent from a common ancestor. In the framework of iterated learning, language evolution converges to an equilibrium that is independent of its starting point, with the effect of shared ancestry decaying over time. Therefore, the central question is the rate of this convergence, which we formally analyze here. We show that convergence occurs in a number of generations that is O(n log n) for Bayesian learning of the ranking of n constraints or the values of n binary parameters. We also present simulations confirming this result and indicating how convergence is affected by the entropy of the prior distribution over languages.