Language Evolution and Computation Bibliography

Our site (www.isrl.uiuc.edu/amag/langev) retired, please use https://langev.com instead.
Mark Pagel
2017
BMC Biology 15:1070-94, 2017
Human language is unique among all forms of animal communication. It is unlikely that any other species, including our close genetic cousins the Neanderthals, ever had language, and so-called sign ‘language’ in Great Apes is nothing like human language. Language evolution shares ...MORE ⇓
Human language is unique among all forms of animal communication. It is unlikely that any other species, including our close genetic cousins the Neanderthals, ever had language, and so-called sign ‘language’ in Great Apes is nothing like human language. Language evolution shares many features with biological evolution, and this has made it useful for tracing recent human history and for studying how culture evolves among groups of people with related languages. A case can be made that language has played a more important role in our species’ recent (circa last 200,000 years) evolution than have our genes.
Psychonomic bulletin & review 24:151-157, 2017
Human languages evolve by a process of descent with modification in which parent languages give rise to daughter languages over time and in a manner that mimics the evolution of biological species. Descent with modification is just one of many parallels between biological and ...MORE ⇓
Human languages evolve by a process of descent with modification in which parent languages give rise to daughter languages over time and in a manner that mimics the evolution of biological species. Descent with modification is just one of many parallels between biological and linguistic evolution that, taken together, offer up a Darwinian perspective on how languages evolve. Combined with statistical methods borrowed from evolutionary biology, this Darwinian perspective has brought new opportunities to the study of the evolution of human languages. These include the statistical inference of phylogenetic trees of languages, the study of how linguistic traits evolve over thousands of years of language change, the reconstruction of ancestral or proto-languages, and using language change to date historical events.
2015
Current Biology 25:1-9, 2015
BACKGROUND Concerted evolution is normally used to describe parallel changes at different sites in a genome, but it is also observed in languages where a specific phoneme changes to the same other phoneme in many words in the lexicon—a phenomenon known as regular sound change. We develop a ...MORE ⇓
BACKGROUND Concerted evolution is normally used to describe parallel changes at different sites in a genome, but it is also observed in languages where a specific phoneme changes to the same other phoneme in many words in the lexicon—a phenomenon known as regular sound change. We develop a general statistical model that can detect concerted changes in aligned sequence data and apply it to study regular sound changes in the Turkic language family. RESULTS Linguistic evolution, unlike the genetic substitutional process, is dominated by events of concerted evolutionary change. Our model identified more than 70 historical events of regular sound change that occurred throughout the evolution of the Turkic language family, while simultaneously inferring a dated phylogenetic tree. Including regular sound changes yielded an approximately 4-fold improvement in the characterization of linguistic change over a simpler model of sporadic change, improved phylogenetic inference, and returned more reliable and plausible dates for events on the phylogenies. The historical timings of the concerted changes closely follow a Poisson process model, and the sound transition networks derived from our model mirror linguistic expectations. CONCLUSIONS We demonstrate that a model with no prior knowledge of complex concerted or regular changes can nevertheless infer the historical timings and genealogical placements of events of concerted change from the signals left in contemporary data. Our model can be applied wherever discrete elements—such as genes, words, cultural trends, technologies, or morphological traits—can change in parallel within an organism or other evolving group.
2013
PNAS 110(21):8471--8476, 2013
The search for ever deeper relationships among the World’s languages is bedeviled by the fact that most words evolve too rapidly to preserve evidence of their ancestry beyond 5,000 to 9,000 y. On the other hand, quantitative modeling indicates that some “ultraconserved” words ...MORE ⇓
The search for ever deeper relationships among the World’s languages is bedeviled by the fact that most words evolve too rapidly to preserve evidence of their ancestry beyond 5,000 to 9,000 y. On the other hand, quantitative modeling indicates that some “ultraconserved” words exist that might be used to find evidence for deep linguistic relationships beyond that time barrier. Here we use a statistical model, which takes into account the frequency with which words are used in common everyday speech, to predict the existence of a set of such highly conserved words among seven language families of Eurasia postulated to form a linguistic superfamily that evolved from a common ancestor around 15,000 y ago. We derive a dated phylogenetic tree of this proposed superfamily with a time-depth of ∼14,450 y, implying that some frequently used words have been retained in related forms since the end of the last ice age. Words used more than once per 1,000 in everyday speech were 7- to 10-times more likely to show deep ancestry on this tree. Our results suggest a remarkable fidelity in the transmission of some words and give theoretical justification to the search for features of language that might be preserved across wide spans of time and geography.
BioEssays, 2013
The Homeric epics are among the greatest masterpieces of literature, but when they were produced is not known with certainty. Here we apply evolutionary-linguistic phylogenetic statistical methods to differences in Homeric, Modern Greek and ancient Hittite vocabulary items to ...MORE ⇓
The Homeric epics are among the greatest masterpieces of literature, but when they were produced is not known with certainty. Here we apply evolutionary-linguistic phylogenetic statistical methods to differences in Homeric, Modern Greek and ancient Hittite vocabulary items to estimate a date of approximately 710–760 BCE for these great works. Our analysis compared a common set of vocabulary items among the three pairs of languages, recording for each item whether the words in the two languages were cognate – derived from a shared ancestral word – or not. We then used a likelihood-based Markov chain Monte Carlo procedure to estimate the most probable times in years separating these languages given the percentage of words they shared, combined with knowledge of the rates at which different words change. Our date for the epics is in close agreement with historians' and classicists' beliefs derived from historical and archaeological sources. The Homeric epics are among the greatest masterpieces of literature. The Iliad's story of the Trojan Wars tells us that the epics were almost certainly produced sometime after the 12th century BCE – if indeed the wars were ever fought – but the question is how much later? Herodotus thought considerably later: Writing in the Histories Book II.53 around 450 BCE, he stated that Homer ‘lived, as I believe, not more than 400 years ago’. The most commonly accepted date among modern classicists, drawing on historical, literary and archaeological analyses, is around the mid-8th century BCE 1, 2, although some authors propose a more recent 7th century BCE date 3. Here, we investigate whether formal statistical modelling of languages can help to inform this historical question. In particular, we investigate whether evolutionary-linguistic statistical methods can be usefully applied to differences in Homeric, Modern Greek and ancient Hittite vocabulary items to provide a date for these great works.
2012
Wired for culture: origins of the human social mind
WW Norton, 2012
A fascinating, far-reaching study of how our species' innate capacity for culture altered the course of our social and evolutionary history. A unique trait of the human species is that our personalities, lifestyles, and worldviews are shaped by an accident of birth—namely, the ...MORE ⇓
A fascinating, far-reaching study of how our species' innate capacity for culture altered the course of our social and evolutionary history. A unique trait of the human species is that our personalities, lifestyles, and worldviews are shaped by an accident of birth—namely, the ...
2011
Philosophical Transactions of the Royal Society B: Biological Sciences 366(1567):1101--1107, 2011
Abstract We present data from 17 languages on the frequency with which a common set of words is used in everyday language. The languages are drawn from six language families representing 65 per cent of the world's 7000 languages. Our data were collected from ...
2009
Nature Reviews Genetics 10:405-415, 2009
Human languages form a distinct and largely independent class of cultural replicators with behaviour and fidelity that can rival that of genes. Parallels between biological and linguistic evolution mean that statistical methods inspired by phylogenetics and comparative biology ...MORE ⇓
Human languages form a distinct and largely independent class of cultural replicators with behaviour and fidelity that can rival that of genes. Parallels between biological and linguistic evolution mean that statistical methods inspired by phylogenetics and comparative biology are being increasingly applied to study language. Phylogenetic trees constructed from linguistic elements chart the history of human cultures, and comparative studies reveal surprising and general features of how languages evolve, including patterns in the rates of evolution of language elements and social factors that influence temporal trends of language evolution. For many comparative questions of anthropology and human behavioural ecology, historical processes estimated from linguistic phylogenies may be more relevant than those estimated from genes.
2008
Science 319(5863):588, 2008
Linguists speculate that human languages often evolve in rapid or punctuational bursts, sometimes associated with their emergence from other languages, but this phenomenon has never been demonstrated. We used vocabulary data from three of the world's major language groups -- ...MORE ⇓
Linguists speculate that human languages often evolve in rapid or punctuational bursts, sometimes associated with their emergence from other languages, but this phenomenon has never been demonstrated. We used vocabulary data from three of the world's major language groups -- Bantu, Indo-European, and Austronesian -- to show that 10 to 33\% of the overall vocabulary differences among these languages arose from rapid bursts of change associated with language-splitting events. Our findings identify a general tendency for increased rates of linguistic evolution in fledgling languages, perhaps arising from a linguistic founder effect or a desire to establish a distinct social identity.
Science 320(5875):446, 2008
While Noah Webster may have produced the earliest compendium on American English, the divergence from British English dates from much earlier. Long before the publication of Webster's Dictionary in 1806, pronunciation in America and in Britain had begun to differ (1, 2). The ...MORE ⇓
While Noah Webster may have produced the earliest compendium on American English, the divergence from British English dates from much earlier. Long before the publication of Webster's Dictionary in 1806, pronunciation in America and in Britain had begun to differ (1, 2). The Dictionary thus does not mark a fixed point when all Americans shifted abruptly from British to American English. The speciation, rather, was gradual, because individual speakers change gradually, by increments, in their lifetimes; individual changes also spread gradually from speaker to speaker.
Behavioral and Brain Sciences 31(5):529-530, 2008
We suggest there is somewhat more potential than Christiansen & Chater (C&C) allow for genetic adaptations specific to language. Our uniquely cooperative social system requires sophisticated language skills. Learning and performance of some culturally transmitted elements ...MORE ⇓
We suggest there is somewhat more potential than Christiansen & Chater (C&C) allow for genetic adaptations specific to language. Our uniquely cooperative social system requires sophisticated language skills. Learning and performance of some culturally transmitted elements in animals is genetically based, and we give examples of features of human language that evolve slowly enough that genetic adaptations to them may arise.
2007
Nature 449(7163):717--720, 2007
Greek speakers say 'omicronupsilonrho', Germans 'schwanz' and the French 'queue' to describe what English speakers call a 'tail', but all of these languages use a related form of 'two' to describe the number after one. Among more than 100 Indo-European languages and dialects, the ...MORE ⇓
Greek speakers say 'omicronupsilonrho', Germans 'schwanz' and the French 'queue' to describe what English speakers call a 'tail', but all of these languages use a related form of 'two' to describe the number after one. Among more than 100 Indo-European languages and dialects, the words for some meanings (such as 'tail') evolve rapidly, being expressed across languages by dozens of unrelated words, while others evolve much more slowly--such as the number 'two', for which all Indo-European language speakers use the same related word-form. No general linguistic mechanism has been advanced to explain this striking variation in rates of lexical replacement among meanings. Here we use four large and divergent language corpora (English, Spanish, Russian and Greek) and a comparative database of 200 fundamental vocabulary meanings in 87 Indo-European languages to show that the frequency with which these words are used in modern language predicts their rate of replacement over thousands of years of Indo-European language evolution. Across all 200 meanings, frequently used words evolve at slower rates and infrequently used words evolve more rapidly. This relationship holds separately and identically across parts of speech for each of the four language corpora, and accounts for approximately 50\% of the variation in historical rates of lexical replacement. We propose that the frequency with which specific words are used in everyday language exerts a general and law-like influence on their rates of evolution. Our findings are consistent with social models of word change that emphasize the role of selection, and suggest that owing to the ways that humans use language, some words will evolve slowly and others rapidly across all languages.
2006
Estimating Rates of Lexical Replacement on Phylogenetic Trees of Languages
Phylogenetic Methods and the Prehistory of Languages 15.0:173-182, 2006
2000
The history, rate and pattern of world linguistic evolution
The Evolutionary Emergence of Language: Social Function and the Origins of Linguistic Form, 2000
Seven thousand or more different languages may currently be spoken around the world (Grimes 1988; Ruhlen 1991). This is more different languages spoken by a single mammalian species than there are mammalian species. Seven thousand different ...
1995
Spatial structure and the evolution of honest cost-free signalling
Proceedings of the Royal Society B: Biological Sciences 260:365-372, 1995
Models of animal signalling stress that among unrelated individuals the transfer of honest information normally requires that signals are costly, and costly in a way related to the true information revealed by the signal. In the absence of such a cost, `cheats', that lie about ...MORE ⇓
Models of animal signalling stress that among unrelated individuals the transfer of honest information normally requires that signals are costly, and costly in a way related to the true information revealed by the signal. In the absence of such a cost, `cheats', that lie about their states or needs, are able to evolve and exploit the preferences of receivers. We show here that spatial constraints imposed on the interactions between signallers and receivers favour honest signalling even in the absence of any costs: `islands' of honesty coexist in `seas' of dishonesty. The extent to which honest or dishonest strategies are favoured, is shown to depend upon the relative payoffs from signalling and receiving. As the receiving component of fitness becomes greater than the signalling component of fitness, as might be true in `life-dinner' type interactions, honesty is increasingly favoured. In addition, in spatial populations, honesty can be favoured locally even when the mean global payoffs to honesty are lower than the mean payoffs to dishonesty. Our model provides a general framework for analysing signals in spatially structured populations and might therefore apply to signalling in both natural and cultural situations.