Language Evolution and Computation Bibliography

Our site (www.isrl.uiuc.edu/amag/langev) retired, please use https://langev.com instead.
Soren Wichmann
2012
PLoS ONE 7(4):e35025, 2012

Background

Recent advances in automated assessment of basic vocabulary lists allow the construction of linguistic phylogenies useful for tracing dynamics of human population expansions, reconstructing ancestral cultures, and modeling transition rates of cultural traits ...MORE ⇓

Background

Recent advances in automated assessment of basic vocabulary lists allow the construction of linguistic phylogenies useful for tracing dynamics of human population expansions, reconstructing ancestral cultures, and modeling transition rates of cultural traits over time.

Methods

Here we investigate the Tupi expansion, a widely-dispersed language family in lowland South America, with a distance-based phylogeny based on 40-word vocabulary lists from 48 languages. We coded 11 cultural traits across the diverse Tupi family including traditional warfare patterns, post-marital residence, corporate structure, community size, paternity beliefs, sibling terminology, presence of canoes, tattooing, shamanism, men's houses, and lip plugs.

Results/Discussion

The linguistic phylogeny supports a Tupi homeland in west-central Brazil with subsequent major expansions across much of lowland South America. Consistently, ancestral reconstructions of cultural traits over the linguistic phylogeny suggest that social complexity has tended to decline through time, most notably in the independent emergence of several nomadic hunter-gatherer societies. Estimated rates of cultural change across the Tupi expansion are on the order of only a few changes per 10,000 years, in accord with previous cultural phylogenetic results in other language families around the world, and indicate a conservative nature to much of human culture.
2011
Automated dating of the worlds language families based on lexical similarity
Current Anthropology 52(6):841--875, 2011
This paper describes a computerized alternative to glottochronology for estimating elapsed time since parent languages diverged into daughter languages. The method, developed by the Automated Similarity Judgment Program (ASJP) consortium, is different from ...
2010
Entropy 12(4):844-858, 2010
The relationship between meanings of words and their sound shapes is to a large extent arbitrary, but it is well known that languages exhibit sound symbolism effects violating arbitrariness. Evidence for sound symbolism is typically anecdotal, however. Here we present a ...MORE ⇓
The relationship between meanings of words and their sound shapes is to a large extent arbitrary, but it is well known that languages exhibit sound symbolism effects violating arbitrariness. Evidence for sound symbolism is typically anecdotal, however. Here we present a systematic approach. Using a selection of basic vocabulary in nearly one half of the worldas languages we find commonalities among sound shapes for words referring to same concepts. These are interpreted as due to sound symbolism. Studying the effects of sound symbolism cross-linguistically is of key importance for the understanding of language evolution.
2009
Human Biology 81(2-3):259-274, 2009
Previous empirical studies of population size and language change have produced equivocal results. We therefore address the question with a new set of lexical data from nearly one half of the world's languages. We first show that relative population sizes of modern languages can ...MORE ⇓
Previous empirical studies of population size and language change have produced equivocal results. We therefore address the question with a new set of lexical data from nearly one half of the world's languages. We first show that relative population sizes of modern languages can be extrapolated to ancestral languages, albeit with diminishing accuracy, up to several thousand years into the past. We then test for an effect of population against the null hypothesis that the ultrametric inequality is satisfied by lexical distances among triples of related languages. The test shows mainly negligible effects of population, the exception being an apparently faster rate of change in the larger of two very closely related variants. A possible explanation for the exception may be the influence on emerging standard (or cross-regional) variants from speakers that shift from different dialects to the standard. Our results strongly indicate that the sizes of speaker populations do not in and of themselves determine rates of language change. Comparison of this empirical finding with previously published computer simulations suggests that the most plausible model for language change is one in which changes propagate at a local level in a type of network where the individuals have different degrees of connectivity.
2008
Journal of Linguistics 44(3):659-675, 2008
This paper presents computer simulations of language populations and the development of language families, showing how a simple model can lead to distributions similar to those observed empirically by Wichmann (2005) and others. The model combines features of two models used in ...MORE ⇓
This paper presents computer simulations of language populations and the development of language families, showing how a simple model can lead to distributions similar to those observed empirically by Wichmann (2005) and others. The model combines features of two models used in earlier work for the simulation of competition among languages: the `Viviane' model for the migration of peoples and the propagation of languages, and the `Schulze' model, which uses bit-strings as a way of characterising structural features of languages.
Birth, survival and death of languages by Monte Carlo simulationPDF
Communications in Computational Physics 3(2):271-294, 2008
Simulations mostly by physicists of the competition between adult languages since 2003 are reviewed. The Viviane and Schulze models give good and reasonable agreement, respectively, with the empirical histogram of language sizes. Also the numbers of different languages within one ...MORE ⇓
Simulations mostly by physicists of the competition between adult languages since 2003 are reviewed. The Viviane and Schulze models give good and reasonable agreement, respectively, with the empirical histogram of language sizes. Also the numbers of different languages within one language family is modeled reasonably in an intermediate range. Bilingualism is now incorporated into the Schulze model. Also the rate at which the majority shifts from one language to another is found to be nearly independent of the population size, or to depend strongly on it, according to details of the Schulze model. Other simulations, like Nettle-Culicover-Nowak, are reviewed more briefly.
Advances in Complex Systems 11(3):357-369, 2008
An earlier study [24] concluded, based on computer simulations and some inferences from empirical data, that languages will change the more slowly the larger the population gets. We replicate this study using a more complete language model for simulations (the Schulze model ...MORE ⇓
An earlier study [24] concluded, based on computer simulations and some inferences from empirical data, that languages will change the more slowly the larger the population gets. We replicate this study using a more complete language model for simulations (the Schulze model combined with a Barabasi-Albert network) and a richer empirical dataset [12]. Our simulations show either a negligible or a strong dependence of language change on population sizes, depending on the parameter settings; while empirical data, like some of the simulations, show a negligible dependence.
Language and Linguistics Compass 2(3):442-455, 2008
Large linguistic databases, especially databases having a global coverage, such as the World Atlas of Language Structures, the Automated Similarity Judgment Program, and Ethnologue, are making it possible to systematically investigate many aspects of how languages change and ...MORE ⇓
Large linguistic databases, especially databases having a global coverage, such as the World Atlas of Language Structures, the Automated Similarity Judgment Program, and Ethnologue, are making it possible to systematically investigate many aspects of how languages change and compete for viability. Agent-based computer simulations supplement such empirical data by analyzing the necessary and sufficient parameters for the current global distributions of languages or linguistic features. By combining empirical datasets with simulations and applying quantitative methods, it is now possible to address fundamental questions, such as 'what are the relative rates of change in different parts of languages?', 'why are there a few large language families, many intermediate ones, and even more small ones?', 'do small languages change faster or slower than large ones?', or 'how does the borrowing of words relate to the borrowing of structural features?'
Language and Linguistics Compass 2(6):1294-1297, 2008
The field of language dynamics encompasses the study and modeling of how languages develop (language evolution), change, and interact (language competition). It contrasts with traditional historical linguistics in several ways: the focus is on the world's linguistic diversity ...MORE ⇓
The field of language dynamics encompasses the study and modeling of how languages develop (language evolution), change, and interact (language competition). It contrasts with traditional historical linguistics in several ways: the focus is on the world's linguistic diversity rather than just on specific languages or language families; methods are quantitative rather than qualitative; computer simulations are employed for elucidating situations that are not immediately observable, being too complex or pertaining to prehistory; the data used are systematic ones gathered in large databases rather than data that happen to be available for select languages. A crucial feature of the methodology is the fine-tuning of simulation models through empirical observations of quantitative distributions such as those of speaker populations or of grammatical features shared among languages.
2007
Linguistic Typology 11(2):395-423, 2007
Modern linguistic typology is increasingly less concerned with what is possible in human languages (universals) and increasingly more with the question ``what's where why?'' (Bickel 2007). Moreover, as several recent papers in this journal show, typologists increasingly turn to ...MORE ⇓
Modern linguistic typology is increasingly less concerned with what is possible in human languages (universals) and increasingly more with the question ``what's where why?'' (Bickel 2007). Moreover, as several recent papers in this journal show, typologists increasingly turn to quantitative approaches as a means to understanding typological distributions. In order to provide the quantitative study of typological distributions with a firm methodological foundation it is preferable to gain a grasp of simple facts before starting to ask the more complicated questions. In this article the only assumptions we make about languages are that (i) they may be partly described by a set of typological characteristics, each of which may either be found or not found in any given language; that (ii) languages may be genealogically related or not; and that (iii) languages are spoken in certain places. Given these minimal assumptions we can begin to ask how to express the differences and similarities among languages as functions of the geographical distances among them, whether different functions apply to genealogically related and unrelated languages, and whether it is possible to distinguish in some quantitative way between languages that are related and languages that are not, even when the languages in question are spoken at great distances from one another. Moreover, we may investigate the effects that factors such as ecology, migration, and rates of linguistic change or diffusion have on the degree of similarities among languages in cases where they are either related or unrelated. We will approach these questions from two perspectives. The first perspective is an empirical one, where observations primarily derive from analyses of the data of Haspelmath et al. (eds.) (2005). The second perspective is a computational one, where simulations are drawn upon to test the effects of different parameters on the development of structural linguistic diversity.
Transactions of the Philological Society 105(2):126-147, 2007
This paper presents the results of the application of a bit-string model of languages (Schulze and Stauffer 2005) to problems of taxonomic patterns. The questions addressed include the following: (1) Which parameters are minimally ne eded for the development of a taxonomic ...MORE ⇓
This paper presents the results of the application of a bit-string model of languages (Schulze and Stauffer 2005) to problems of taxonomic patterns. The questions addressed include the following: (1) Which parameters are minimally ne eded for the development of a taxonomic dynamics leading to the type of distribution of language family sizes currently attested (as measured in the i number of languages per family), which appears to be a power-law? (2) How may such a model be coupled with one of the dynamics of speaker populations leading to the type of language size seen today, which appears to follow a log-normal distribution?
How to use typological databases in historical linguistic research
Diachronica 24(2):373--404, 2007
Abstract: Several databases have been compiled with the aim of documenting the distribution of typological features across the world's languages. This paper looks at ways of utilizing this type of data for making inferences concerning genealogical relationships by ...
2006
Physica A: Statistical Mechanics and its Applications 371(2):719-724, 2006
The bit-string model of Schulze and Stauffer (2005) is applied to non-equilibrium situations and then gives better agreement with the empirical distribution of language sizes. Here the size is the number of people having this language as mother tongue. In contrast, when ...MORE ⇓
The bit-string model of Schulze and Stauffer (2005) is applied to non-equilibrium situations and then gives better agreement with the empirical distribution of language sizes. Here the size is the number of people having this language as mother tongue. In contrast, when equilibrium is combined with irreversible mutations of languages, one language always dominates and is spoken by at least 80 percent of the population.
2005
Journal of Linguistics 41(1):117-131, 2005
When the sizes of language families of the world, measured by the number of languages contained in each family, are plotted in descending order on a diagram where the x-axis represents the place of each family in the rank-order (the largest family having rank 1, the next-largest, ...MORE ⇓
When the sizes of language families of the world, measured by the number of languages contained in each family, are plotted in descending order on a diagram where the x-axis represents the place of each family in the rank-order (the largest family having rank 1, the next-largest, rank 2, and so on) and the y-axis represents the number of languages in the family determining the rank-ordering, it is seen that the distribution closely approximates a curve defined by the formula $y=ax^{[minus sign]b}$. Such `power-law' distributions are known to characterize a wide range of social, biological, and physical phenomena and are essentially of a stochastic nature. It is suggested that the apparent power-law distribution of language family sizes is of relevance when evaluating overall classifications of the world's languages, for the analysis of taxonomic structures, for developing hypotheses concerning the prehistory of the world's languages, and for modelling the future extinction of language families.