Language Evolution and Computation Bibliography

Our site (www.isrl.uiuc.edu/amag/langev) retired, please use https://langev.com instead.
Andreea S Calude
2013
PNAS 110(21):8471--8476, 2013
The search for ever deeper relationships among the World’s languages is bedeviled by the fact that most words evolve too rapidly to preserve evidence of their ancestry beyond 5,000 to 9,000 y. On the other hand, quantitative modeling indicates that some “ultraconserved” words ...MORE ⇓
The search for ever deeper relationships among the World’s languages is bedeviled by the fact that most words evolve too rapidly to preserve evidence of their ancestry beyond 5,000 to 9,000 y. On the other hand, quantitative modeling indicates that some “ultraconserved” words exist that might be used to find evidence for deep linguistic relationships beyond that time barrier. Here we use a statistical model, which takes into account the frequency with which words are used in common everyday speech, to predict the existence of a set of such highly conserved words among seven language families of Eurasia postulated to form a linguistic superfamily that evolved from a common ancestor around 15,000 y ago. We derive a dated phylogenetic tree of this proposed superfamily with a time-depth of ∼14,450 y, implying that some frequently used words have been retained in related forms since the end of the last ice age. Words used more than once per 1,000 in everyday speech were 7- to 10-times more likely to show deep ancestry on this tree. Our results suggest a remarkable fidelity in the transmission of some words and give theoretical justification to the search for features of language that might be preserved across wide spans of time and geography.
BioEssays, 2013
The Homeric epics are among the greatest masterpieces of literature, but when they were produced is not known with certainty. Here we apply evolutionary-linguistic phylogenetic statistical methods to differences in Homeric, Modern Greek and ancient Hittite vocabulary items to ...MORE ⇓
The Homeric epics are among the greatest masterpieces of literature, but when they were produced is not known with certainty. Here we apply evolutionary-linguistic phylogenetic statistical methods to differences in Homeric, Modern Greek and ancient Hittite vocabulary items to estimate a date of approximately 710–760 BCE for these great works. Our analysis compared a common set of vocabulary items among the three pairs of languages, recording for each item whether the words in the two languages were cognate – derived from a shared ancestral word – or not. We then used a likelihood-based Markov chain Monte Carlo procedure to estimate the most probable times in years separating these languages given the percentage of words they shared, combined with knowledge of the rates at which different words change. Our date for the epics is in close agreement with historians' and classicists' beliefs derived from historical and archaeological sources. The Homeric epics are among the greatest masterpieces of literature. The Iliad's story of the Trojan Wars tells us that the epics were almost certainly produced sometime after the 12th century BCE – if indeed the wars were ever fought – but the question is how much later? Herodotus thought considerably later: Writing in the Histories Book II.53 around 450 BCE, he stated that Homer ‘lived, as I believe, not more than 400 years ago’. The most commonly accepted date among modern classicists, drawing on historical, literary and archaeological analyses, is around the mid-8th century BCE 1, 2, although some authors propose a more recent 7th century BCE date 3. Here, we investigate whether formal statistical modelling of languages can help to inform this historical question. In particular, we investigate whether evolutionary-linguistic statistical methods can be usefully applied to differences in Homeric, Modern Greek and ancient Hittite vocabulary items to provide a date for these great works.
2011
Philosophical Transactions of the Royal Society B: Biological Sciences 366(1567):1101--1107, 2011
Abstract We present data from 17 languages on the frequency with which a common set of words is used in everyday language. The languages are drawn from six language families representing 65 per cent of the world's 7000 languages. Our data were collected from ...