Language Evolution and Computation Bibliography

Our site (www.isrl.uiuc.edu/amag/langev) retired, please use https://langev.com instead.
Quentin D. Atkinson
2018
Nature Ecology & Evolution 2:741-749, 2018
It remains a mystery how Pama–Nyungan, the world’s largest hunter-gatherer language family, came to dominate the Australian continent. Some argue that social or technological advantages allowed rapid language replacement from the Gulf Plains region during the mid-Holocene. Others ...MORE ⇓
It remains a mystery how Pama–Nyungan, the world’s largest hunter-gatherer language family, came to dominate the Australian continent. Some argue that social or technological advantages allowed rapid language replacement from the Gulf Plains region during the mid-Holocene. Others have proposed expansions from refugia linked to climatic changes after the last ice age or, more controversially, during the initial colonization of Australia. Here, we combine basic vocabulary data from 306 Pama–Nyungan languages with Bayesian phylogeographic methods to explicitly model the expansion of the family across Australia and test between these origin scenarios. We find strong and robust support for a Pama–Nyungan origin in the Gulf Plains region during the mid-Holocene, implying rapid replacement of non-Pama–Nyungan languages. Concomitant changes in the archaeological record, together with a lack of strong genetic evidence for Holocene population expansion, suggests that Pama–Nyungan languages were carried as part of an expanding package of cultural innovations that probably facilitated the absorption and assimilation of existing hunter-gatherer groups. A Bayesian phylogeographic analysis of vocabulary from 306 Pama–Nyungan languages suggests that the language family rose to dominance across Australia in a process of rapid replacement following an origin in the Gulf Plains region during the mid-Holocene.
2017
School of Psychology, University of Auckland, Auckland, New Zealand, 2017
We present a new open source software tool called BEASTling, designed to simplify the preparation of Bayesian phylogenetic analyses of linguistic data using the BEAST 2 platform. BEASTling transforms comparatively short and human-readable configuration files into the XML files ...MORE ⇓
We present a new open source software tool called BEASTling, designed to simplify the preparation of Bayesian phylogenetic analyses of linguistic data using the BEAST 2 platform. BEASTling transforms comparatively short and human-readable configuration files into the XML files used by BEAST to specify analyses. By taking advantage of Creative Commons-licensed data from the Glottolog language catalog, BEASTling allows the user to conveniently filter datasets using names for recognised language families, to impose monophyly constraints so that inferred language trees are backward compatible with Glottolog classifications, or to assign geographic location data to languages for phylogeographic analyses. Support for the emerging cross-linguistic linked data format (CLDF) permits easy incorporation of data published in cross-linguistic linked databases into analyses. BEASTling is intended to make the power of Bayesian analysis more accessible to historical linguists without strong programming backgrounds, in the hopes of encouraging communication and collaboration between those developing computational models of language evolution (who are typically not linguists) and relevant domain experts.
2013
PNAS 110(11):4159-60, 2013
The word for “sky” in the indigenous Saaroa language of Taiwan is laŋica. Across the South China Sea in the Philippines, the speakers of Ilonggo use laŋit, whereas, on the far-flung islands of the Pacific, Hawaiians say lani and Rarotongans and New Zealand Maori raŋi (1). ...MORE ⇓
The word for “sky” in the indigenous Saaroa language of Taiwan is laŋica. Across the South China Sea in the Philippines, the speakers of Ilonggo use laŋit, whereas, on the far-flung islands of the Pacific, Hawaiians say lani and Rarotongans and New Zealand Maori raŋi (1). Systematic sound correspondences between many such words tell us that these languages have evolved from a common ancestor to form part of the Austronesian language family. By meticulously comparing the sounds of words across many languages, linguists can learn about the genealogical relationships between languages and the people who speak them, how sounds change through time and even how long-extinct ancestral languages would have sounded. In PNAS, Bouchard-Côté et al. (2) automate this process by using probabilistic models of sound change to trace the evolution of thousands of words across more than 600 Austronesian languages.
Proceedings of the Royal Society B: Biological Sciences 280, 2013
Despite a burgeoning science of cultural evolution, relatively little work has focused on the population structure of human cultural variation. By contrast, studies in human population genetics use a suite of tools to quantify and analyse spatial and temporal patterns of genetic ...MORE ⇓
Despite a burgeoning science of cultural evolution, relatively little work has focused on the population structure of human cultural variation. By contrast, studies in human population genetics use a suite of tools to quantify and analyse spatial and temporal patterns of genetic variation within and between populations. Human genetic diversity can be explained largely as a result of migration and drift giving rise to gradual genetic clines, together with some discontinuities arising from geographical and cultural barriers to gene flow. Here, we adapt theory and methods from population genetics to quantify the influence of geography and ethnolinguistic boundaries on the distribution of 700 variants of a folktale in 31 European ethnolinguis- tic populations. We find that geographical distance and ethnolinguistic affiliation exert significant independent effects on folktale diversity and that variation between populations supports a clustering concordant with European geography. This pattern of geographical clines and clusters paral- lels the pattern of human genetic diversity in Europe, although the effects of geographical distance and ethnolinguistic boundaries are stronger for folk- tales than genes. Our findings highlight the importance of geography and population boundaries in models of human cultural variation and point to key similarities and differences between evolutionary processes operating on human genes and culture.
PNAS 110(21):8471--8476, 2013
The search for ever deeper relationships among the World’s languages is bedeviled by the fact that most words evolve too rapidly to preserve evidence of their ancestry beyond 5,000 to 9,000 y. On the other hand, quantitative modeling indicates that some “ultraconserved” words ...MORE ⇓
The search for ever deeper relationships among the World’s languages is bedeviled by the fact that most words evolve too rapidly to preserve evidence of their ancestry beyond 5,000 to 9,000 y. On the other hand, quantitative modeling indicates that some “ultraconserved” words exist that might be used to find evidence for deep linguistic relationships beyond that time barrier. Here we use a statistical model, which takes into account the frequency with which words are used in common everyday speech, to predict the existence of a set of such highly conserved words among seven language families of Eurasia postulated to form a linguistic superfamily that evolved from a common ancestor around 15,000 y ago. We derive a dated phylogenetic tree of this proposed superfamily with a time-depth of ∼14,450 y, implying that some frequently used words have been retained in related forms since the end of the last ice age. Words used more than once per 1,000 in everyday speech were 7- to 10-times more likely to show deep ancestry on this tree. Our results suggest a remarkable fidelity in the transmission of some words and give theoretical justification to the search for features of language that might be preserved across wide spans of time and geography.
2012
Response to Comments on Phonemic Diversity Supports a Serial Founder Effect Model of Language Expansion from AfricaPDF
Science 335(6069):657--657, 2012
Abstract Concerns have been raised about my proposal that global phonemic diversity was shaped by a serial founder effect during the human expansion from Africa. I welcome this discussion of new data and alternative interpretations. Although this work highlights ...
Science 337(6097):957--960, 2012
There are two competing hypotheses for the origin of the Indo-European language family. The conventional view places the homeland in the Pontic steppes about 6000 years ago. An alternative hypothesis claims that the languages spread from Anatolia with the expansion of farming ...MORE ⇓
There are two competing hypotheses for the origin of the Indo-European language family. The conventional view places the homeland in the Pontic steppes about 6000 years ago. An alternative hypothesis claims that the languages spread from Anatolia with the expansion of farming 8000 to 9500 years ago. We used Bayesian phylogeographic approaches, together with basic vocabulary data from 103 ancient and contemporary Indo-European languages, to explicitly model the expansion of the family and test these hypotheses. We found decisive support for an Anatolian origin over a steppe origin. Both the inferred timing and root location of the Indo-European language trees fit with an agricultural expansion from Anatolia beginning 8000 to 9500 years ago. These results highlight the critical role that phylogeographic inference can play in resolving debates about human prehistory.
2011
Linking spatial patterns of language variation to ancient demography and population migrations
Linguistic Typology 15(2):321--332, 2011
I am most grateful to those who have contributed to this collection for their insightful comments on my proposal that global variation in phonemic diversity reflects the legacy of a serial founder effect following the human expansion from Africa (Atkinson 2011). The ...
Science 332(6027):346-349, 2011
Human genetic and phenotypic diversity declines with distance from Africa, as predicted by a serial founder effect in which successive population bottlenecks during range expansion progressively reduce diversity, underpinning support for an African origin of modern humans. Recent ...MORE ⇓
Human genetic and phenotypic diversity declines with distance from Africa, as predicted by a serial founder effect in which successive population bottlenecks during range expansion progressively reduce diversity, underpinning support for an African origin of modern humans. Recent work suggests that a similar founder effect may operate on human culture and language. Here I show that the number of phonemes used in a global sample of 504 languages is also clinal and fits a serial founder-effect model of expansion from an inferred origin in Africa. This result, which is not explained by more recent demographic history, local language diversity, or statistical non-independence within language families, points to parallel mechanisms shaping genetic and linguistic diversity and supports an African origin of modern human languages.
Philosophical Transactions of the Royal Society B: Biological Sciences 366(1567):1090--1100, 2011
Abstract Historical inference is at its most powerful when independent lines of evidence can be integrated into a coherent account. Dating linguistic and cultural lineages can potentially play a vital role in the integration of evidence from linguistics, anthropology, archaeology ...MORE ⇓
Abstract Historical inference is at its most powerful when independent lines of evidence can be integrated into a coherent account. Dating linguistic and cultural lineages can potentially play a vital role in the integration of evidence from linguistics, anthropology, archaeology ...
2010
Proceedings of the Royal Society B: Biological Sciences 277(1693):2443-2450, 2010
There are approximately 7000 languages spoken in the world today. This diversity reflects the legacy of thousands of years of cultural evolution. How far back we can trace this history depends largely on the rate at which the different components of language evolve. Rates of ...MORE ⇓
There are approximately 7000 languages spoken in the world today. This diversity reflects the legacy of thousands of years of cultural evolution. How far back we can trace this history depends largely on the rate at which the different components of language evolve. Rates of lexical evolution are widely thought to impose an upper limit of 6000-10 000 years on reliably identifying language relationships. In contrast, it has been argued that certain structural elements of language are much more stable. Just as biologists use highly conserved genes to uncover the deepest branches in the tree of life, highly stable linguistic features hold the promise of identifying deep relationships between the world's languages. Here, we present the first global network of languages based on this typological information. We evaluate the relative evolutionary rates of both typological and lexical features in the Austronesian and Indo-European language families. The first indications are that typological features evolve at similar rates to basic vocabulary but their evolution is substantially less tree-like. Our results suggest that, while rates of vocabulary change are correlated between the two language families, the rates of evolution of typological features and structural subtypes show no consistent relationship across families.
2008
Science 319(5863):588, 2008
Linguists speculate that human languages often evolve in rapid or punctuational bursts, sometimes associated with their emergence from other languages, but this phenomenon has never been demonstrated. We used vocabulary data from three of the world's major language groups -- ...MORE ⇓
Linguists speculate that human languages often evolve in rapid or punctuational bursts, sometimes associated with their emergence from other languages, but this phenomenon has never been demonstrated. We used vocabulary data from three of the world's major language groups -- Bantu, Indo-European, and Austronesian -- to show that 10 to 33\% of the overall vocabulary differences among these languages arose from rapid bursts of change associated with language-splitting events. Our findings identify a general tendency for increased rates of linguistic evolution in fledgling languages, perhaps arising from a linguistic founder effect or a desire to establish a distinct social identity.
Science 320(5875):446, 2008
While Noah Webster may have produced the earliest compendium on American English, the divergence from British English dates from much earlier. Long before the publication of Webster's Dictionary in 1806, pronunciation in America and in Britain had begun to differ (1, 2). The ...MORE ⇓
While Noah Webster may have produced the earliest compendium on American English, the divergence from British English dates from much earlier. Long before the publication of Webster's Dictionary in 1806, pronunciation in America and in Britain had begun to differ (1, 2). The Dictionary thus does not mark a fixed point when all Americans shifted abruptly from British to American English. The speciation, rather, was gradual, because individual speakers change gradually, by increments, in their lifetimes; individual changes also spread gradually from speaker to speaker.
Behavioral and Brain Sciences 31(5):529-530, 2008
We suggest there is somewhat more potential than Christiansen & Chater (C&C) allow for genetic adaptations specific to language. Our uniquely cooperative social system requires sophisticated language skills. Learning and performance of some culturally transmitted elements ...MORE ⇓
We suggest there is somewhat more potential than Christiansen & Chater (C&C) allow for genetic adaptations specific to language. Our uniquely cooperative social system requires sophisticated language skills. Learning and performance of some culturally transmitted elements in animals is genetically based, and we give examples of features of human language that evolve slowly enough that genetic adaptations to them may arise.
2007
Nature 449(7163):717--720, 2007
Greek speakers say 'omicronupsilonrho', Germans 'schwanz' and the French 'queue' to describe what English speakers call a 'tail', but all of these languages use a related form of 'two' to describe the number after one. Among more than 100 Indo-European languages and dialects, the ...MORE ⇓
Greek speakers say 'omicronupsilonrho', Germans 'schwanz' and the French 'queue' to describe what English speakers call a 'tail', but all of these languages use a related form of 'two' to describe the number after one. Among more than 100 Indo-European languages and dialects, the words for some meanings (such as 'tail') evolve rapidly, being expressed across languages by dozens of unrelated words, while others evolve much more slowly--such as the number 'two', for which all Indo-European language speakers use the same related word-form. No general linguistic mechanism has been advanced to explain this striking variation in rates of lexical replacement among meanings. Here we use four large and divergent language corpora (English, Spanish, Russian and Greek) and a comparative database of 200 fundamental vocabulary meanings in 87 Indo-European languages to show that the frequency with which these words are used in modern language predicts their rate of replacement over thousands of years of Indo-European language evolution. Across all 200 meanings, frequently used words evolve at slower rates and infrequently used words evolve more rapidly. This relationship holds separately and identically across parts of speech for each of the four language corpora, and accounts for approximately 50\% of the variation in historical rates of lexical replacement. We propose that the frequency with which specific words are used in everyday language exerts a general and law-like influence on their rates of evolution. Our findings are consistent with social models of word change that emphasize the role of selection, and suggest that owing to the ways that humans use language, some words will evolve slowly and others rapidly across all languages.
2006
How Old is the Indo-European Language Family? Illumination or More Moths to the Flame?
Phylogenetic Methods and the Prehistory of Languages 8.0:91-, 2006
European (the hypothesized ancestral Indo‑European tongue) with the Kurgan culture of southern Russia and the Ukraine. The Kurgans were a group of semi‑nomadic, pastoralist, warrior‑horsemen who expand‑ed from their homeland in the Russian steppes during the ...
2005
Transactions of the Philological Society 103(2):193-219, 2005
Gray & Atkinson's (2003) application of quantitative phylogenetic methods to Dyen, Kruskal & Black's (1992) Indo-European database produced controversial divergence time estimates. Here we test the robustness of these results using an alternative data set of ancient Indo-European ...MORE ⇓
Gray & Atkinson's (2003) application of quantitative phylogenetic methods to Dyen, Kruskal & Black's (1992) Indo-European database produced controversial divergence time estimates. Here we test the robustness of these results using an alternative data set of ancient Indo-European languages. We employ two very different stochastic models of lexical evolution - Gray & Atkinson's (2003) finite-sites model and a stochastic-Dollo model of word evolution introduced by Nicholls & Gray (in press). Results of this analysis support the findings of Gray & Atkinson (2003). We also tested the ability of both methods to reconstruct phylogeny and divergence times accurately from synthetic data. The methods performed well under a range of scenarios, including widespread and localized borrowing.
2003
Nature 426(6965):435-439, 2003
Languages, like genes, provide vital clues about human history. The origin of the Indo-European language family is ``the most intensively studied, yet still most recalcitrant, problem of historical linguistics''. Numerous genetic studies of Indo-European origins have also ...MORE ⇓
Languages, like genes, provide vital clues about human history. The origin of the Indo-European language family is ``the most intensively studied, yet still most recalcitrant, problem of historical linguistics''. Numerous genetic studies of Indo-European origins have also produced inconclusive results. Here we analyse linguistic data using computational methods derived from evolutionary biology. We test two theories of Indo-European origin: the 'Kurgan expansion' and the 'Anatolian farming' hypotheses. The Kurgan theory centres on possible archaeological evidence for an expansion into Europe and the Near East by Kurgan horsemen beginning in the sixth millennium BP. In contrast, the Anatolian theory claims that Indo-European languages expanded with the spread of agriculture from Anatolia around 8,000-9,500 years BP. In striking agreement with the Anatolian hypothesis, our analysis of a matrix of 87 languages with 2,449 lexical items produced an estimated age range for the initial Indo-European divergence of between 7,800 and 9,800 years BP. These results were robust to changes in coding procedures, calibration points, rooting of the trees and priors in the bayesian analysis.