Language Evolution and Computation Bibliography

Our site (www.isrl.uiuc.edu/amag/langev) retired, please use https://langev.com instead.
Simon J. Greenhill
2018
Front. Psychol. 9:317-335, 2018
What role does speaker population size play in shaping rates of language evolution? There has been little consensus on the expected relationship between rates and patterns of language change and speaker population size, with some predicting faster rates of change in smaller ...MORE ⇓
What role does speaker population size play in shaping rates of language evolution? There has been little consensus on the expected relationship between rates and patterns of language change and speaker population size, with some predicting faster rates of change in smaller populations, and others expecting greater change in larger populations. The growth of comparative databases has allowed population size effects to be investigated across a wide range of language groups, with mixed results. One recent study of a group of Polynesian languages revealed greater rates of word gain in larger populations and greater rates of word loss in smaller populations. However, that test was restricted to 20 closely related languages from small Oceanic islands. Here, we test if this pattern is a general feature of language evolution across a larger and more diverse sample of languages from both continental and island populations. We analyzed comparative language data for 153 pairs of closely-related sister languages from three of the world's largest language families: Austronesian, Indo-European, and Niger-Congo. We find some evidence that rates of word loss are significantly greater in smaller languages for the Indo-European comparisons, but we find no significant patterns in the other two language families. These results suggest either that the influence of population size on rates and patterns of language evolution is not universal, or that it is sufficiently weak that it may be overwhelmed by other influences in some cases. Further investigation, for a greater number of language comparisons and a wider range of language features, may determine which of these explanations holds true.
Scientific Data 5(180205), 2018
The amount of available digital data for the languages of the world is constantly increasing. Unfortunately, most of the digital data are provided in a large variety of formats and therefore not amenable for comparison and re-use. The Cross-Linguistic Data Formats initiative ...MORE ⇓
The amount of available digital data for the languages of the world is constantly increasing. Unfortunately, most of the digital data are provided in a large variety of formats and therefore not amenable for comparison and re-use. The Cross-Linguistic Data Formats initiative proposes new standards for two basic types of data in historical and typological language comparison (word lists, structural datasets) and a framework to incorporate more data types (e.g. parallel texts, and dictionaries). The new specification for cross-linguistic data formats comes along with a software package for validation and manipulation, a basic ontology which links to more general frameworks, and usage examples of best practices.
Journal of Language Evolution 3(2):130-144, 2018
With increasing amounts of digitally available data from all over the world, manual annotation of cognates in multi-lingual word lists becomes more and more time-consuming in historical linguistics. Using available software packages to pre-process the data prior to manual ...MORE ⇓
With increasing amounts of digitally available data from all over the world, manual annotation of cognates in multi-lingual word lists becomes more and more time-consuming in historical linguistics. Using available software packages to pre-process the data prior to manual analysis can drastically speed-up the process of cognate detection. Furthermore, it allows us to get a quick overview on data which have not yet been intensively studied by experts. LingPy is a Python library which provides a large arsenal of routines for sequence comparison in historical linguistics. With LingPy, linguists can not only automatically search for cognates in lexical data, but they can also align the automatically identified words, and output them in various forms, which aim at facilitating manual inspection. In this tutorial, we will briefly introduce the basic concepts behind the algorithms employed by LingPy and then illustrate in concrete workflows how automatic sequence comparison can be applied to multi-lingual word lists. The goal is to provide the readers with all information they need to (1) carry out cognate detection and alignment analyses in LingPy, (2) select the appropriate algorithms for the appropriate task, (3) evaluate how well automatic cognate detection algorithms perform compared to experts, and (4) export their data into various formats useful for additional analyses or data sharing. While basic knowledge of the Python language is useful for all analyses, our tutorial is structured in such a way that scholars with basic knowledge of computing can follow through all steps as well.
Journal of Language Evolution 3(2):91-93, 2018
Unlike a standard online experiment, a gaming app lets participants interact freely with a vast number of partners, as many times as they wish. The gain is not merely one of statistical power. Cultural evolutionists can use gaming apps to allow large numbers of participants to ...MORE ⇓
Unlike a standard online experiment, a gaming app lets participants interact freely with a vast number of partners, as many times as they wish. The gain is not merely one of statistical power. Cultural evolutionists can use gaming apps to allow large numbers of participants to communicate synchronously; to build realistic transmission chains that avoid the losses of information that occurs in linear chains; and to study the effects of partner choice as well as partner control in social interactions. We are releasing an app designed to take advantage of these opportunities and generate realistic language evolution dynamics.
MPI Max Planck Society, 2018
Can we communicate across the barrier of languages, with images instead of sounds? The scientists behind the color game will document the evolution of a new kind of language, a language beyond words. They will explore the way that new symbols emerge, acquire a meaning, or change ...MORE ⇓
Can we communicate across the barrier of languages, with images instead of sounds? The scientists behind the color game will document the evolution of a new kind of language, a language beyond words. They will explore the way that new symbols emerge, acquire a meaning, or change their meaning, over time and across space. Will the color game give birth to different dialects, languages that only some people can understand but not others? Will the images of the color game evolve in the same way that words for colour evolved through human history? These are some of the questions that the creators of the Color Game hope to answer.
2015
PNAS 112(7):2097-2102, 2015
The effect of population size on patterns and rates of language evolution is controversial. Do languages with larger speaker populations change faster due to a greater capacity for innovation, or do smaller populations change faster due to more efficient diffusion of innovations? ...MORE ⇓
The effect of population size on patterns and rates of language evolution is controversial. Do languages with larger speaker populations change faster due to a greater capacity for innovation, or do smaller populations change faster due to more efficient diffusion of innovations? Do smaller populations suffer greater loss of language elements through founder effects or drift, or do languages with more speakers lose features due to a process of simplification? Revealing the influence of population size on the tempo and mode of language evolution not only will clarify underlying mechanisms of language change but also has practical implications for the way that language data are used to reconstruct the history of human cultures. Here, we provide, to our knowledge, the first empirical, statistically robust test of the influence of population size on rates of language evolution, controlling for the evolutionary history of the populations and formally comparing the fit of different models of language evolution. We compare rates of gain and loss of cognate words for basic vocabulary in Polynesian languages, an ideal test case with a well-defined history. We demonstrate that larger populations have higher rates of gain of new words whereas smaller populations have higher rates of word loss. These results show that demographic factors can influence rates of language evolution and that rates of gain and loss are affected differently. These findings are strikingly consistent with general predictions of evolutionary models.
2013
Proceedings of the Royal Society B: Biological Sciences 280, 2013
Despite a burgeoning science of cultural evolution, relatively little work has focused on the population structure of human cultural variation. By contrast, studies in human population genetics use a suite of tools to quantify and analyse spatial and temporal patterns of genetic ...MORE ⇓
Despite a burgeoning science of cultural evolution, relatively little work has focused on the population structure of human cultural variation. By contrast, studies in human population genetics use a suite of tools to quantify and analyse spatial and temporal patterns of genetic variation within and between populations. Human genetic diversity can be explained largely as a result of migration and drift giving rise to gradual genetic clines, together with some discontinuities arising from geographical and cultural barriers to gene flow. Here, we adapt theory and methods from population genetics to quantify the influence of geography and ethnolinguistic boundaries on the distribution of 700 variants of a folktale in 31 European ethnolinguis- tic populations. We find that geographical distance and ethnolinguistic affiliation exert significant independent effects on folktale diversity and that variation between populations supports a clustering concordant with European geography. This pattern of geographical clines and clusters paral- lels the pattern of human genetic diversity in Europe, although the effects of geographical distance and ethnolinguistic boundaries are stronger for folk- tales than genes. Our findings highlight the importance of geography and population boundaries in models of human cultural variation and point to key similarities and differences between evolutionary processes operating on human genes and culture.
2012
Science 337(6097):957--960, 2012
There are two competing hypotheses for the origin of the Indo-European language family. The conventional view places the homeland in the Pontic steppes about 6000 years ago. An alternative hypothesis claims that the languages spread from Anatolia with the expansion of farming ...MORE ⇓
There are two competing hypotheses for the origin of the Indo-European language family. The conventional view places the homeland in the Pontic steppes about 6000 years ago. An alternative hypothesis claims that the languages spread from Anatolia with the expansion of farming 8000 to 9500 years ago. We used Bayesian phylogeographic approaches, together with basic vocabulary data from 103 ancient and contemporary Indo-European languages, to explicitly model the expansion of the family and test these hypotheses. We found decisive support for an Anatolian origin over a steppe origin. Both the inferred timing and root location of the Indo-European language trees fit with an agricultural expansion from Anatolia beginning 8000 to 9500 years ago. These results highlight the critical role that phylogeographic inference can play in resolving debates about human prehistory.
2011
Nature, 2011
Languages vary widely but not without limit. The central goal of linguistics is to describe the diversity of human languages and explain the constraints on that diversity. Generative linguists following Chomsky have claimed that linguistic diversity must be constrained by innate ...MORE ⇓
Languages vary widely but not without limit. The central goal of linguistics is to describe the diversity of human languages and explain the constraints on that diversity. Generative linguists following Chomsky have claimed that linguistic diversity must be constrained by innate parameters that are set as a child learns a language. In contrast, other linguists following Greenberg have claimed that there are statistical tendencies for co-occurrence of traits reflecting universal systems biases, rather than absolute constraints or parametric variation. Here we use computational phylogenetic methods to address the nature of constraints on linguistic diversity in an evolutionary framework. First, contrary to the generative account of parameter setting, we show that the evolution of only a few word-order features of languages are strongly correlated. Second, contrary to the Greenbergian generalizations, we show that most observed functional dependencies between traits are lineage-specific rather than universal tendencies. These findings support the view that-at least with respect to word order-cultural evolution is the primary factor that determines linguistic structure, with the current state of a linguistic system shaping and constraining future states.
Philosophical Transactions of the Royal Society B: Biological Sciences 366(1567):1090--1100, 2011
Abstract Historical inference is at its most powerful when independent lines of evidence can be integrated into a coherent account. Dating linguistic and cultural lineages can potentially play a vital role in the integration of evidence from linguistics, anthropology, archaeology ...MORE ⇓
Abstract Historical inference is at its most powerful when independent lines of evidence can be integrated into a coherent account. Dating linguistic and cultural lineages can potentially play a vital role in the integration of evidence from linguistics, anthropology, archaeology ...
Universal typological dependencies should be detectable in the history of language families
Linguistic Typology 15(2):509--534, 2011
We claim that making sense of the typological diversity of languages demands a historical/evolutionary approach. We are pleased that the target paper (Dunn et al. 2011a) has served to bring discussion of this claim into prominence, and are grateful that leading ...
2010
Nature 467:801-804, 2010
There is disagreement about whether human political evolution has proceeded through a sequence of incremental increases in complexity, or whether larger, non-sequential increases have occurred. The extent to which societies have decreased in complexity is also unclear. These ...MORE ⇓
There is disagreement about whether human political evolution has proceeded through a sequence of incremental increases in complexity, or whether larger, non-sequential increases have occurred. The extent to which societies have decreased in complexity is also unclear. These debates have continued largely in the absence of rigorous, quantitative tests. We evaluated six competing models of political evolution in Austronesian-speaking societies using phylogenetic methods. Here we show that in the best-fitting model political complexity rises and falls in a sequence of small steps. This is closely followed by another model in which increases are sequential but decreases can be either sequential or in bigger drops. The results indicate that large, non-sequential jumps in political complexity have not occurred during the evolutionary history of these societies. This suggests that, despite the numerous contingent pathways of human history, there are regularities in cultural evolution that can be detected using computational phylogenetic methods.
Philosophical Transactions of the Royal Society B: Biological Sciences 365(1559):3903--3912, 2010
Abstract Phylogenetic comparative methods (PCMs) provide a potentially powerful toolkit for testing hypotheses about cultural evolution. Here, we build on previous simulation work to assess the effect horizontal transmission between cultures has on the ability of both ...
Philosophical Transactions of the Royal Society B: Biological Sciences 365(1559):3923-3933, 2010
In this paper we outline two debates about the nature of human cultural history. The first focuses on the extent to which human history is tree-like (its shape), and the second on the unity of that history (its fabric). Proponents of cultural phylogenetics are often accused of ...MORE ⇓
In this paper we outline two debates about the nature of human cultural history. The first focuses on the extent to which human history is tree-like (its shape), and the second on the unity of that history (its fabric). Proponents of cultural phylogenetics are often accused of assuming that human history has been both highly tree-like and consisting of tightly linked lineages. Critics have pointed out obvious exceptions to these assumptions. Instead of a priori dichotomous disputes about the validity of cultural phylogenetics, we suggest that the debate is better conceptualized as involving positions along continuous dimensions. The challenge for empirical research is, therefore, to determine where particular aspects of culture lie on these dimensions. We discuss the ability of current computational methods derived from evolutionary biology to address these questions. These methods are then used to compare the extent to which lexical evolution is tree-like in different parts of the world and to evaluate the coherence of cultural and linguistic lineages.
Proceedings of the Royal Society B: Biological Sciences 277(1693):2443-2450, 2010
There are approximately 7000 languages spoken in the world today. This diversity reflects the legacy of thousands of years of cultural evolution. How far back we can trace this history depends largely on the rate at which the different components of language evolve. Rates of ...MORE ⇓
There are approximately 7000 languages spoken in the world today. This diversity reflects the legacy of thousands of years of cultural evolution. How far back we can trace this history depends largely on the rate at which the different components of language evolve. Rates of lexical evolution are widely thought to impose an upper limit of 6000-10 000 years on reliably identifying language relationships. In contrast, it has been argued that certain structural elements of language are much more stable. Just as biologists use highly conserved genes to uncover the deepest branches in the tree of life, highly stable linguistic features hold the promise of identifying deep relationships between the world's languages. Here, we present the first global network of languages based on this typological information. We evaluate the relative evolutionary rates of both typological and lexical features in the Austronesian and Indo-European language families. The first indications are that typological features evolve at similar rates to basic vocabulary but their evolution is substantially less tree-like. Our results suggest that, while rates of vocabulary change are correlated between the two language families, the rates of evolution of typological features and structural subtypes show no consistent relationship across families.
PLoS ONE 5(3):e9573, 2010
We recently used computational phylogenetic methods on lexical data to test between two scenarios for the peopling of the Pacific. Our analyses of lexical data supported a pulse-pause scenario of Pacific settlement in which the Austronesian speakers originated in Taiwan around ...MORE ⇓
We recently used computational phylogenetic methods on lexical data to test between two scenarios for the peopling of the Pacific. Our analyses of lexical data supported a pulse-pause scenario of Pacific settlement in which the Austronesian speakers originated in Taiwan around 5,200 years ago and rapidly spread through the Pacific in a series of expansion pulses and settlement pauses. We claimed that there was high congruence between traditional language subgroups and those observed in the language phylogenies, and that the estimated age of the Austronesian expansion at 5,200 years ago was consistent with the archaeological evidence. However, the congruence between the language phylogenies and the evidence from historical linguistics was not quantitatively assessed using tree comparison metrics. The robustness of the divergence time estimates to different calibration points was also not investigated exhaustively. Here we address these limitations by using a systematic tree comparison metric to calculate the similarity between the Bayesian phylogenetic trees and the subgroups proposed by historical linguistics, and by re-estimating the age of the Austronesian expansion using only the most robust calibrations. The results show that the Austronesian language phylogenies are highly congruent with the traditional subgroupings, and the date estimates are robust even when calculated using a restricted set of historical calibrations.
2009
Science 323(5913):479-483, 2009
Debates about human prehistory often center on the role that population expansions play in shaping biological and cultural diversity. Hypotheses on the origin of the Austronesian settlers of the Pacific are divided between a recent 'pulse-pause' expansion from Taiwan and an older ...MORE ⇓
Debates about human prehistory often center on the role that population expansions play in shaping biological and cultural diversity. Hypotheses on the origin of the Austronesian settlers of the Pacific are divided between a recent 'pulse-pause' expansion from Taiwan and an older 'slow-boat' diffusion from Wallacea. We used lexical data and Bayesian phylogenetic methods to construct a phylogeny of 400 languages. In agreement with the pulse-pause scenario, the language trees place the Austronesian origin in Taiwan approximately 5230 years ago and reveal a series of settlement pauses and expansion pulses linked to technological and social innovations. These results are robust to assumptions about the rooting and calibration of the trees and demonstrate the combined power of linguistic scholarship, database technologies, and computational phylogenetic methods for resolving questions about human prehistory.
Proceedings of the Royal Society B: Biological Sciences 276(1665):2299-2306, 2009
Phylogenetic methods have recently been applied to studies of cultural evolution. However, it has been claimed that the large amount of horizontal transmission that sometimes occurs between cultural groups invalidates the use of these methods. Here, we use a natural model of ...MORE ⇓
Phylogenetic methods have recently been applied to studies of cultural evolution. However, it has been claimed that the large amount of horizontal transmission that sometimes occurs between cultural groups invalidates the use of these methods. Here, we use a natural model of linguistic evolution to simulate borrowing between languages. The results show that tree topologies constructed with Bayesian phylogenetic methods are robust to realistic levels of borrowing. Inferences about divergence dates are slightly less robust and show a tendency to underestimate dates. Our results demonstrate that realistic levels of reticulation between cultures do not invalidate a phylogenetic approach to cultural and linguistic evolution.
Proceedings of the Royal Society B: Biological Sciences 276(1664):1957--1964, 2009
Abstract The nature of social life in human prehistory is elusive, yet knowing how kinship systems evolve is critical for understanding population history and cultural diversity. Post-marital residence rules specify sex-specific dispersal and kin association, influencing the ...
2008
Science 319(5863):588, 2008
Linguists speculate that human languages often evolve in rapid or punctuational bursts, sometimes associated with their emergence from other languages, but this phenomenon has never been demonstrated. We used vocabulary data from three of the world's major language groups -- ...MORE ⇓
Linguists speculate that human languages often evolve in rapid or punctuational bursts, sometimes associated with their emergence from other languages, but this phenomenon has never been demonstrated. We used vocabulary data from three of the world's major language groups -- Bantu, Indo-European, and Austronesian -- to show that 10 to 33\% of the overall vocabulary differences among these languages arose from rapid bursts of change associated with language-splitting events. Our findings identify a general tendency for increased rates of linguistic evolution in fledgling languages, perhaps arising from a linguistic founder effect or a desire to establish a distinct social identity.
The Austronesian Basic Vocabulary Database: From Bioinformatics to LexomicsPDF
Evolutionary Bioinformatics 4:271-283, 2008
Phylogenetic methods have revolutionised evolutionary biology and have recently been applied to studies of linguistic and cultural evolution. However, the basic comparative data on the languages of the world required for these analyses is often widely dispersed in hard to obtain ...MORE ⇓
Phylogenetic methods have revolutionised evolutionary biology and have recently been applied to studies of linguistic and cultural evolution. However, the basic comparative data on the languages of the world required for these analyses is often widely dispersed in hard to obtain sources. Here we outline how our Austronesian Basic Vocabulary Database (ABVD) helps remedy this situation by collating wordlists from over 500 languages into one web-accessible database. We describe the technology underlying the ABVD and discuss the benefits that an evolutionary bioinformatic approach can provide. These include facilitating computational comparative linguistic research, answering questions about human prehistory, enabling syntheses with genetic data, and safe-guarding fragile linguistic information.
Science 320(5875):446, 2008
While Noah Webster may have produced the earliest compendium on American English, the divergence from British English dates from much earlier. Long before the publication of Webster's Dictionary in 1806, pronunciation in America and in Britain had begun to differ (1, 2). The ...MORE ⇓
While Noah Webster may have produced the earliest compendium on American English, the divergence from British English dates from much earlier. Long before the publication of Webster's Dictionary in 1806, pronunciation in America and in Britain had begun to differ (1, 2). The Dictionary thus does not mark a fixed point when all Americans shifted abruptly from British to American English. The speciation, rather, was gradual, because individual speakers change gradually, by increments, in their lifetimes; individual changes also spread gradually from speaker to speaker.
2007
The pleasures and perils of Darwinizing culture (with phylogenies)PDF
Biological Theory 2(4):360--375, 2007
Abstract Current debates about “Darwinizing culture” have typically focused on the validity of memetics. In this article we argue that meme-like inheritance is not a necessary requirement for descent with modification. We suggest that an alternative and more productive way of ...MORE ⇓
Abstract Current debates about “Darwinizing culture” have typically focused on the validity of memetics. In this article we argue that meme-like inheritance is not a necessary requirement for descent with modification. We suggest that an alternative and more productive way of ...