Language Evolution and Computation Bibliography

Our site (www.isrl.uiuc.edu/amag/langev) retired, please use https://langev.com instead.
Michael Dunn
2012
Science 337(6097):957--960, 2012
There are two competing hypotheses for the origin of the Indo-European language family. The conventional view places the homeland in the Pontic steppes about 6000 years ago. An alternative hypothesis claims that the languages spread from Anatolia with the expansion of farming ...MORE ⇓
There are two competing hypotheses for the origin of the Indo-European language family. The conventional view places the homeland in the Pontic steppes about 6000 years ago. An alternative hypothesis claims that the languages spread from Anatolia with the expansion of farming 8000 to 9500 years ago. We used Bayesian phylogeographic approaches, together with basic vocabulary data from 103 ancient and contemporary Indo-European languages, to explicitly model the expansion of the family and test these hypotheses. We found decisive support for an Anatolian origin over a steppe origin. Both the inferred timing and root location of the Indo-European language trees fit with an agricultural expansion from Anatolia beginning 8000 to 9500 years ago. These results highlight the critical role that phylogeographic inference can play in resolving debates about human prehistory.
2011
Nature, 2011
Languages vary widely but not without limit. The central goal of linguistics is to describe the diversity of human languages and explain the constraints on that diversity. Generative linguists following Chomsky have claimed that linguistic diversity must be constrained by innate ...MORE ⇓
Languages vary widely but not without limit. The central goal of linguistics is to describe the diversity of human languages and explain the constraints on that diversity. Generative linguists following Chomsky have claimed that linguistic diversity must be constrained by innate parameters that are set as a child learns a language. In contrast, other linguists following Greenberg have claimed that there are statistical tendencies for co-occurrence of traits reflecting universal systems biases, rather than absolute constraints or parametric variation. Here we use computational phylogenetic methods to address the nature of constraints on linguistic diversity in an evolutionary framework. First, contrary to the generative account of parameter setting, we show that the evolution of only a few word-order features of languages are strongly correlated. Second, contrary to the Greenbergian generalizations, we show that most observed functional dependencies between traits are lineage-specific rather than universal tendencies. These findings support the view that-at least with respect to word order-cultural evolution is the primary factor that determines linguistic structure, with the current state of a linguistic system shaping and constraining future states.
Universal typological dependencies should be detectable in the history of language families
Linguistic Typology 15(2):509--534, 2011
We claim that making sense of the typological diversity of languages demands a historical/evolutionary approach. We are pleased that the target paper (Dunn et al. 2011a) has served to bring discussion of this claim into prominence, and are grateful that leading ...
2009
PLoS Biology 7(11):e1000241, 2009
The region of the ancient Sahul continent (present day Australia and New Guinea, and surrounding islands) is home to extreme linguistic diversity. Even apart from the huge Austronesian language family, which spread into the area after the breakup of the Sahul continent in the ...MORE ⇓
The region of the ancient Sahul continent (present day Australia and New Guinea, and surrounding islands) is home to extreme linguistic diversity. Even apart from the huge Austronesian language family, which spread into the area after the breakup of the Sahul continent in the Holocene, there are hundreds of languages from many apparently unrelated families. On each of the subcontinents, the generally accepted classification recognizes one large, widespread family and a number of unrelatable smaller families. If these language families are related to each other, it is at a depth which is inaccessible to standard linguistic methods. We have inferred the history of structural characteristics of these languages under an admixture model, using a Bayesian algorithm originally developed to discover populations on the basis of recombining genetic markers. This analysis identifies 10 ancestral language populations, some of which can be identified with clearly defined phylogenetic groups. The results also show traces of early dispersals, including hints at ancient connections between Australian languages and some Papuan groups (long hypothesized, never before demonstrated). Systematic language contact effects between members of big phylogenetic groups are also detected, which can in some cases be identified with a diffusional or substrate signal. Most interestingly, however, there remains striking evidence of a phylogenetic signal, with many languages showing negligible amounts of admixture.
2008
Structural phylogeny in historical linguistics: methodological explorations applied in Island MelanesiaPDF
Language 84(4):710--759, 2008
Abstract Using various methods derived from evolutionary biology, including maximum parsimony and Bayesian phylogenetic analysis, we tackle the question of the relationships among a group of Papuan isolate languages that have hitherto resisted accepted attempts ...
PLoS genetics 4(10):e1000239, 2008
Recent studies have detailed a remarkable degree of genetic and linguistic diversity in Northern Island Melanesia. Here we utilize that diversity to examine two models of genetic and linguistic coevolution. The first model predicts that genetic and linguistic ...
2005
Science 309(5743):2072-2075, 2005
The contribution of language history to the study of the early dispersals of modern humans throughout the Old World has been limited by the shallow time depth (about 8000 {+/-} 2000 years) of current linguistic methods. Here it is shown that the application of biological ...MORE ⇓
The contribution of language history to the study of the early dispersals of modern humans throughout the Old World has been limited by the shallow time depth (about 8000 {+/-} 2000 years) of current linguistic methods. Here it is shown that the application of biological cladistic methods, not to vocabulary (as has been previously tried) but to language structure (sound systems and grammar), may extend the time depths at which language data can be used. The method was tested against well-understood families of Oceanic Austronesian languages, then applied to the Papuan languages of Island Melanesia, a group of hitherto unrelatable isolates. Papuan languages show an archipelago-based phylogenetic signal that is consistent with the current geographical distribution of languages. The most plausible hypothesis to explain this result is the divergence of the Papuan languages from a common ancestral stock, as part of late Pleistocene dispersals.