Language Evolution and Computation Bibliography

Our site (www.isrl.uiuc.edu/amag/langev) retired, please use https://langev.com instead.
Journal :: Nature
2017
Nature 551:223-226, 2017
Both language and genes evolve by transmission over generations with opportunity for differential replication of forms. The understanding that gene frequencies change at random by genetic drift, even in the absence of natural selection, was a seminal advance in evolutionary ...MORE ⇓
Both language and genes evolve by transmission over generations with opportunity for differential replication of forms. The understanding that gene frequencies change at random by genetic drift, even in the absence of natural selection, was a seminal advance in evolutionary biology. Stochastic drift must also occur in language as a result of randomness in how linguistic forms are copied between speakers. Here we quantify the strength of selection relative to stochastic drift in language evolution. We use time series derived from large corpora of annotated texts dating from the 12th to 21st centuries to analyse three well-known grammatical changes in English: the regularization of past-tense verbs, the introduction of the periphrastic ‘do’, and variation in verbal negation. We reject stochastic drift in favour of selection in some cases but not in others. In particular, we infer selection towards the irregular forms of some past-tense verbs, which is likely driven by changing frequencies of rhyming patterns over time. We show that stochastic drift is stronger for rare words, which may explain why rare forms are more prone to replacement than common ones. This work provides a method for testing selective theories of language change against a null model and reveals an underappreciated role for stochasticity in language evolution.
2013
Nature 498(7452):104–108, 2013
Human language, as well as birdsong, relies on the ability to arrange vocal elements in new sequences. However, little is known about the ontogenetic origin of this capacity. Here we track the development of vocal combinatorial capacity in three species of vocal learners, ...MORE ⇓
Human language, as well as birdsong, relies on the ability to arrange vocal elements in new sequences. However, little is known about the ontogenetic origin of this capacity. Here we track the development of vocal combinatorial capacity in three species of vocal learners, combining an experimental approach in zebra finches (Taeniopygia guttata) with an analysis of natural development of vocal transitions in Bengalese finches (Lonchura striata domestica) and pre-lingual human infants. We find a common, stepwise pattern of acquiring vocal transitions across species. In our first study, juvenile zebra finches were trained to perform one song and then the training target was altered, prompting the birds to swap syllable order, or insert a new syllable into a string. All birds solved these permutation tasks in a series of steps, gradually approximating the target sequence by acquiring new pairwise syllable transitions, sometimes too slowly to accomplish the task fully. Similarly, in the more complex songs of Bengalese finches, branching points and bidirectional transitions in song syntax were acquired in a stepwise fashion, starting from a more restrictive set of vocal transitions. The babbling of pre-lingual human infants showed a similar pattern: instead of a single developmental shift from reduplicated to variegated babbling (that is, from repetitive to diverse sequences), we observed multiple shifts, where each new syllable type slowly acquired a diversity of pairwise transitions, asynchronously over development. Collectively, these results point to a common generative process that is conserved across species, suggesting that the long-noted gap between perceptual versus motor combinatorial capabilities in human infants1 may arise partly from the challenges in constructing new pairwise vocal transitions.
2011
Nature, 2011
Languages vary widely but not without limit. The central goal of linguistics is to describe the diversity of human languages and explain the constraints on that diversity. Generative linguists following Chomsky have claimed that linguistic diversity must be constrained by innate ...MORE ⇓
Languages vary widely but not without limit. The central goal of linguistics is to describe the diversity of human languages and explain the constraints on that diversity. Generative linguists following Chomsky have claimed that linguistic diversity must be constrained by innate parameters that are set as a child learns a language. In contrast, other linguists following Greenberg have claimed that there are statistical tendencies for co-occurrence of traits reflecting universal systems biases, rather than absolute constraints or parametric variation. Here we use computational phylogenetic methods to address the nature of constraints on linguistic diversity in an evolutionary framework. First, contrary to the generative account of parameter setting, we show that the evolution of only a few word-order features of languages are strongly correlated. Second, contrary to the Greenbergian generalizations, we show that most observed functional dependencies between traits are lineage-specific rather than universal tendencies. These findings support the view that-at least with respect to word order-cultural evolution is the primary factor that determines linguistic structure, with the current state of a linguistic system shaping and constraining future states.
Nature 476:291-292, 2011
Tracing a common ancestry between languages becomes harder as the connection goes further back in time. A new test has revealed a surprisingly ancient relationship between a central Siberian and a North American language family.
2010
Nature 467:801-804, 2010
There is disagreement about whether human political evolution has proceeded through a sequence of incremental increases in complexity, or whether larger, non-sequential increases have occurred. The extent to which societies have decreased in complexity is also unclear. These ...MORE ⇓
There is disagreement about whether human political evolution has proceeded through a sequence of incremental increases in complexity, or whether larger, non-sequential increases have occurred. The extent to which societies have decreased in complexity is also unclear. These debates have continued largely in the absence of rigorous, quantitative tests. We evaluated six competing models of political evolution in Austronesian-speaking societies using phylogenetic methods. Here we show that in the best-fitting model political complexity rises and falls in a sequence of small steps. This is closely followed by another model in which increases are sequential but decreases can be either sequential or in bigger drops. The results indicate that large, non-sequential jumps in political complexity have not occurred during the evolutionary history of these societies. This suggests that, despite the numerous contingent pathways of human history, there are regularities in cultural evolution that can be detected using computational phylogenetic methods.
2009
Nature, 2009
Mutations in the FOXP2 gene could help explain why humans can speak but chimps can't.
Nature 459(7246):519--520, 2009
Abstract Both birdsong and human language are learned, requiring complex social input. New findings show, however, that bird populations' seeded'with aberrant song input transform it to normal song in a few generations.
Nature 459(7246):564-568, 2009
Culture is typically viewed as consisting of traits inherited epigenetically, through social learning. However, cultural diversity has species-typical constraints(1), presumably of genetic origin. A celebrated, if contentious, example is whether a universal grammar constrains ...MORE ⇓
Culture is typically viewed as consisting of traits inherited epigenetically, through social learning. However, cultural diversity has species-typical constraints(1), presumably of genetic origin. A celebrated, if contentious, example is whether a universal grammar constrains syntactic diversity in human languages(2). Oscine songbirds exhibit song learning and provide biologically tractable models of culture: members of a species show individual variation in song(3) and geographically separated groups have local song dialects(4,5). Different species exhibit distinct song cultures(6,7), suggestive of genetic constraints(8,9). Without such constraints, innovations and copying errors should cause unbounded variation over multiple generations or geographical distance, contrary to observations(9). Here we report an experiment designed to determine whether wild-type song culture might emerge over multiple generations in an isolated colony founded by isolates, and, if so, how this might happen and what type of social environment is required(10). Zebra finch isolates, unexposed to singing males during development, produce song with characteristics that differ from the wild-type song found in laboratory(11) or natural colonies. In tutoring lineages starting from isolate founders, we quantified alterations in song across tutoring generations in two social environments: tutor-pupil pairs in sound-isolated chambers and an isolated semi-natural colony. In both settings, juveniles imitated the isolate tutors but changed certain characteristics of the songs. These alterations accumulated over learning generations. Consequently, songs evolved towards the wild-type in three to four generations. Thus, species-typical song culture can appear de novo. Our study has parallels with language change and evolution(12-14). In analogy to models in quantitative genetics(15,16), we model song culture as a multigenerational phenotype partly encoded genetically in an isolate founding population, influenced by environmental variables and taking multiple generations to emerge.
Nature 460(7252):190-196, 2009
Insights from evolutionary developmental biology and the mind sciences could change our understanding of the human capacity to think and the ways in which the human mind constrains cultural expressions.
Nature 462(7270):169--170, 2009
The FOXP2 gene is implicated in the development of human speech and language. A comparison of the human and chimpanzee FOXP2 proteins highlights the differences in function in the two species.
Nature 462(7270):213--217, 2009
The signalling pathways controlling both the evolution and development of language in the human brain remain unknown. So far, the transcription factor FOXP2 (forkhead box P2) is the only gene implicated in Mendelian forms of human speech and language dysfunction. It has been ...MORE ⇓
The signalling pathways controlling both the evolution and development of language in the human brain remain unknown. So far, the transcription factor FOXP2 (forkhead box P2) is the only gene implicated in Mendelian forms of human speech and language dysfunction. It has been proposed that the amino acid composition in the human variant of FOXP2 has undergone accelerated evolution, and this two-amino-acid change occurred around the time of language emergence in humans. However, this remains controversial, and whether the acquisition of these amino acids in human FOXP2 has any functional consequence in human neurons remains untested. Here we demonstrate that these two human-specific amino acids alter FOXP2 function by conferring differential transcriptional regulation in vitro. We extend these observations in vivo to human and chimpanzee brain, and use network analysis to identify novel relationships among the differentially expressed genes. These data provide experimental support for the functional relevance of changes in FOXP2 that occur on the human lineage, highlighting specific pathways with direct consequences for human brain development and disease in the central nervous system (CNS). Because FOXP2 has an important role in speech and language in humans, the identified targets may have a critical function in the development and evolution of language circuitry in humans.
2008
Nature 453:446-448, 2008
Some researchers think that the evolution of languages can be understood by treating them like genomes -- but many linguists don't want to hear about it.
Nature 456:40-41, 2008
Language evolved as part of a uniquely human group of traits, the interdependence of which calls for an integrated approach to the study of brain function, argue Eors Szathmary and Szabolcs Szamado.
2007
Nature 449(7163):665--667, 2007
Quantitative relationships between how frequently a word is used and how rapidly it changes over time raise intriguing questions about the way individual behaviours determine large-scale linguistic and cultural change.
Nature 449(7163):713--716, 2007
Human language is based on grammatical rules. Cultural evolution allows these rules to change over time. Rules compete with each other: as new rules rise to prominence, old ones die away. To quantify the dynamics of language evolution, we studied the regularization of English ...MORE ⇓
Human language is based on grammatical rules. Cultural evolution allows these rules to change over time. Rules compete with each other: as new rules rise to prominence, old ones die away. To quantify the dynamics of language evolution, we studied the regularization of English verbs over the past 1,200 years. Although an elaborate system of productive conjugations existed in English's proto-Germanic ancestor, Modern English uses the dental suffix, '-ed', to signify past tense. Here we describe the emergence of this linguistic rule amidst the evolutionary decay of its exceptions, known to us as irregular verbs. We have generated a data set of verbs whose conjugations have been evolving for more than a millennium, tracking inflectional changes to 177 Old-English irregular verbs. Of these irregular verbs, 145 remained irregular in Middle English and 98 are still irregular today. We study how the rate of regularization depends on the frequency of word usage. The half-life of an irregular verb scales as the square root of its usage frequency: a verb that is 100 times less frequent regularizes 10 times as fast. Our study provides a quantitative analysis of the regularization process by which ancestral forms gradually yield to an emerging linguistic rule.
Nature 449(7163):717--720, 2007
Greek speakers say 'omicronupsilonrho', Germans 'schwanz' and the French 'queue' to describe what English speakers call a 'tail', but all of these languages use a related form of 'two' to describe the number after one. Among more than 100 Indo-European languages and dialects, the ...MORE ⇓
Greek speakers say 'omicronupsilonrho', Germans 'schwanz' and the French 'queue' to describe what English speakers call a 'tail', but all of these languages use a related form of 'two' to describe the number after one. Among more than 100 Indo-European languages and dialects, the words for some meanings (such as 'tail') evolve rapidly, being expressed across languages by dozens of unrelated words, while others evolve much more slowly--such as the number 'two', for which all Indo-European language speakers use the same related word-form. No general linguistic mechanism has been advanced to explain this striking variation in rates of lexical replacement among meanings. Here we use four large and divergent language corpora (English, Spanish, Russian and Greek) and a comparative database of 200 fundamental vocabulary meanings in 87 Indo-European languages to show that the frequency with which these words are used in modern language predicts their rate of replacement over thousands of years of Indo-European language evolution. Across all 200 meanings, frequently used words evolve at slower rates and infrequently used words evolve more rapidly. This relationship holds separately and identically across parts of speech for each of the four language corpora, and accounts for approximately 50\% of the variation in historical rates of lexical replacement. We propose that the frequency with which specific words are used in everyday language exerts a general and law-like influence on their rates of evolution. Our findings are consistent with social models of word change that emphasize the role of selection, and suggest that owing to the ways that humans use language, some words will evolve slowly and others rapidly across all languages.
2006
Nature 440:1117-1118, 2006
Recursion, once thought to be the unique province of human language, now seems to be within the ken of a common songbird -- perhaps providing insight into the origins of language.
Nature 440:1204-1207, 2006
Humans regularly produce new utterances that are understood by other members of the same language community. Linguistic theories account for this ability through the use of syntactic rules (or generative grammars) that describe the acceptable structure of utterances. The ...MORE ⇓
Humans regularly produce new utterances that are understood by other members of the same language community. Linguistic theories account for this ability through the use of syntactic rules (or generative grammars) that describe the acceptable structure of utterances. The recursive, hierarchical embedding of language units (for example, words or phrases within shorter sentences) that is part of the ability to construct new utterances minimally requires a 'context-free' grammar that is more complex than the 'finite-state' grammars thought sufficient to specify the structure of all non-human communication signals. Recent hypotheses make the central claim that the capacity for syntactic recursion forms the computational core of a uniquely human language faculty. Here we show that European starlings (Sturnus vulgaris) accurately recognize acoustic patterns defined by a recursive, self-embedding, context-free grammar. They are also able to classify new patterns defined by the grammar and reliably exclude agrammatical patterns. Thus, the capacity to classify sequences from recursive, centre-embedded grammars is not uniquely human. This finding opens a new range of complex syntactic processing mechanisms to physiological investigation.
Nature 441:303, 2006
Syntax sets human language apart from other natural communication systems, although its evolutionary origins are obscure. Here we show that free-ranging putty-nosed monkeys combine two vocalizations into different call sequences that are linked to specific external events, such ...MORE ⇓
Syntax sets human language apart from other natural communication systems, although its evolutionary origins are obscure. Here we show that free-ranging putty-nosed monkeys combine two vocalizations into different call sequences that are linked to specific external events, such as the presence of a predator and the imminent movement of the group. Our findings indicate that non-human primates can combine calls into higher-order sequences that have a particular meaning.
2005
Nature 434:289, 2005
Human language is based on syntax, a complex set of rules about how words can be combined. In theory, the emergence of syntactic communication might have been a comparatively straightforward process.
Nature 438(288), 2005
The propensity to make music is the most mysterious, wonderful, and neglected feature of humankind: this is where Steven Mithen began, drawing together strands from archaeology, anthropology, psychology, neuroscience--and, of course, musicology--to explain why we ...
2003
Nature 423:276-279, 2003
There are global threats to biodiversity with current extinction rates well above background levels1. Although less well publicized, numerous human languages have also become extinct, and others are threatened with extinction. However, estimates of the number of threatened ...MORE ⇓
There are global threats to biodiversity with current extinction rates well above background levels1. Although less well publicized, numerous human languages have also become extinct, and others are threatened with extinction. However, estimates of the number of threatened languages vary considerably owing to the wide range of criteria used. For example, languages have been classified as threatened if the number of speakers is less than 100, 500, 1,000, 10,000, 20,000 or 100,000 (ref. 3). Here I show, by applying internationally agreed criteria for classifying species extinction risk4, that languages are more threatened than birds or mammals. Rare languages are more likely to show evidence of decline than commoner ones. Areas with high language diversity also have high bird and mammal diversity and all three show similar relationships to area, latitude, area of forest and, for languages and birds, maximum altitude. The time of human settlement has little effect on current language diversity. Although similar factors explain the diversity of languages and biodiversity, the factors explaining extinction risk for birds and mammals (high altitude, high human densities and insularity) do not explain the numbers of endangered languages.
Nature 424:900, 2003
Thousands of the world's languages are vanishing at an alarming rate, with 90% of them being expected to disappear with the current generation1. Here we develop a simple model of language competition that explains historical data on the decline of Welsh, Scottish Gaelic, Quechua ...MORE ⇓
Thousands of the world's languages are vanishing at an alarming rate, with 90% of them being expected to disappear with the current generation1. Here we develop a simple model of language competition that explains historical data on the decline of Welsh, Scottish Gaelic, Quechua (the most common surviving indigenous language in the Americas) and other endangered languages. A linguistic parameter that quantifies the threat of language extinction can be derived from the model and may be useful in the design and evaluation of language-preservation programmes.
Nature 426(6965):435-439, 2003
Languages, like genes, provide vital clues about human history. The origin of the Indo-European language family is ``the most intensively studied, yet still most recalcitrant, problem of historical linguistics''. Numerous genetic studies of Indo-European origins have also ...MORE ⇓
Languages, like genes, provide vital clues about human history. The origin of the Indo-European language family is ``the most intensively studied, yet still most recalcitrant, problem of historical linguistics''. Numerous genetic studies of Indo-European origins have also produced inconclusive results. Here we analyse linguistic data using computational methods derived from evolutionary biology. We test two theories of Indo-European origin: the 'Kurgan expansion' and the 'Anatolian farming' hypotheses. The Kurgan theory centres on possible archaeological evidence for an expansion into Europe and the Near East by Kurgan horsemen beginning in the sixth millennium BP. In contrast, the Anatolian theory claims that Indo-European languages expanded with the spread of agriculture from Anatolia around 8,000-9,500 years BP. In striking agreement with the Anatolian hypothesis, our analysis of a matrix of 87 languages with 2,449 lexical items produced an estimated age range for the initial Indo-European divergence of between 7,800 and 9,800 years BP. These results were robust to changes in coding procedures, calibration points, rooting of the trees and priors in the bayesian analysis.
2002
Nature 417:611-617, 2002
Language is our legacy. It is the main evolutionary contribution of humans, and perhaps the most interesting trait that has emerged in the past 500 million years. Understanding how darwinian evolution gives rise to human language requires the integration of formal language ...MORE ⇓
Language is our legacy. It is the main evolutionary contribution of humans, and perhaps the most interesting trait that has emerged in the past 500 million years. Understanding how darwinian evolution gives rise to human language requires the integration of formal language theory, learning theory and evolutionary dynamics. Formal language theory provides a mathematical description of language and grammar. Learning theory formalizes the task of language acquisition--it can be shown that no procedure can learn an unrestricted set of languages. Universal grammar specifies the restricted set of languages learnable by the human brain. Evolutionary dynamics can be formulated to describe the cultural evolution of language and the biological evolution of universal grammar.
Nature 418:869-872, 2002
Language is a uniquely human trait likely to have been a prerequisite for the development of human culture. The ability to develop articulate speech relies on capabilities, such as fine control of the larynx and mouth, that are absent in chimpanzees and other great apes. FOXP2 is ...MORE ⇓
Language is a uniquely human trait likely to have been a prerequisite for the development of human culture. The ability to develop articulate speech relies on capabilities, such as fine control of the larynx and mouth, that are absent in chimpanzees and other great apes. FOXP2 is the first gene relevant to the human ability to develop language. A point mutation in FOXP2 co-segregates with a disorder in a family in which half of the members have severe articulation difficulties accompanied by linguistic and grammatical impairment. This gene is disrupted by translocation in an unrelated individual who has a similar disorder. Thus, two functional copies of FOXP2 seem to be required for acquisition of normal spoken language. We sequenced the complementary DNAs that encode the FOXP2 protein in the chimpanzee, gorilla, orang-utan, rhesus macaque and mouse, and compared them with the human cDNA. We also investigated intraspecific variation of the human FOXP2 gene. Here we show that human FOXP2 contains changes in amino-acid coding and a pattern of nucleotide polymorphism, which strongly suggest that this gene has been the target of selection during recent human evolution.
Nature 420:211-217, 2002
Linguistic metaphors have been woven into the fabric of molecular biology since its inception. The determination of the human genome sequence has brought these metaphors to the forefront of the popular imagination, with the natural extension of the notion of DNA as language to ...MORE ⇓
Linguistic metaphors have been woven into the fabric of molecular biology since its inception. The determination of the human genome sequence has brought these metaphors to the forefront of the popular imagination, with the natural extension of the notion of DNA as language to that of the genome as the 'book of life'. But do these analogies go deeper and, if so, can the methods developed for analysing languages be applied to molecular biology? In fact, many techniques used in bioinformatics, even if developed independently, may be seen to be grounded in linguistics. Further interweaving of these fields will be instrumental in extending our understanding of the language of life.
2001
Nature 413:465-467, 2001
Does our ability to talk lie in our genes? The suspicion is bolstered by the discovery of a gene that might affect how the brain circuitry needed for speech and language develops.
Nature 413:519-523, 2001
Individuals affected with developmental disorders of speech and language have substantial difficulty acquiring expressive and/or receptive language in the absence of any profound sensory or neurological impairment and despite adequate intelligence and opportunity. Although ...MORE ⇓
Individuals affected with developmental disorders of speech and language have substantial difficulty acquiring expressive and/or receptive language in the absence of any profound sensory or neurological impairment and despite adequate intelligence and opportunity. Although studies of twins consistently indicate that a significant genetic component is involved, most families segregating speech and language deficits show complex patterns of inheritance, and a gene that predisposes individuals to such disorders has not been identified. We have studied a unique three-generation pedigree, KE, in which a severe speech and language disorder is transmitted as an autosomal-dominant monogenic trait. Our previous work mapped the locus responsible, SPCH1, to a 5.6-cM interval of region 7q31 on chromosome 7 (ref. 5). We also identified an unrelated individual, CS, in whom speech and language impairment is associated with a chromosomal translocation involving the SPCH1 interval. Here we show that the gene FOXP2, which encodes a putative transcription factor containing a polyglutamine tract and a forkhead DNA-binding domain, is directly disrupted by the translocation breakpoint in CS. In addition, we identify a point mutation in affected members of the KE family that alters an invariant amino-acid residue in the forkhead domain. Our findings suggest that FOXP2 is involved in the developmental process that culminates in speech and language.
2000
Nature 404:441-442, 2000
There are no fossils to show how language evolved. But evolutionary game theory is revealing how some of the defining features of human language could have been shaped by natural selection.
Nature 404:495-498, 2000
Animal communication is typically non-syntactic, which means that signals refer to whole situations. Human language is syntactic, and signals consist of discrete components that have their own meaning. Syntax is a prerequisite for taking advantage of combinatorics, that is, ...MORE ⇓
Animal communication is typically non-syntactic, which means that signals refer to whole situations. Human language is syntactic, and signals consist of discrete components that have their own meaning. Syntax is a prerequisite for taking advantage of combinatorics, that is, 'making infinite use of finite means'. The vast expressive power of human language would be impossible without syntax, and the transition from non-syntactic to syntactic communication was an essential step in the evolution of human language. We aim to understand the evolutionary dynamics of this transition and to analyse how natural selection can guide it. Here we present a model for the population dynamics of language evolution, define the basic reproductive ratio of words and calculate the maximum size of a lexicon. Syntax allows larger repertoires and the possibility to formulate messages that have not been learned beforehand. Nevertheless, according to our model natural selection can only favour the emergence of syntax if the number of required signals exceeds a threshold value. This result might explain why only humans evolved syntactic communication and hence complex language.
1998
Nature 391(6664):279-281, 1998
Deaf children whose access to usable conventional linguistic input, signed or spoken, is severely limited nevertheless use gesture to communicate. These gestures resemble natural language in that they are structured at the level both of sentence and of word. Although the ...MORE ⇓
Deaf children whose access to usable conventional linguistic input, signed or spoken, is severely limited nevertheless use gesture to communicate. These gestures resemble natural language in that they are structured at the level both of sentence and of word. Although the inclination to use gesture may be traceable to the fact that the deaf children's hearing parents, like all speakers, gesture as they talk, the children themselves are responsible for introducing language-like structure into their gestures. We have explored the robustness of this phenomenon by observing deaf children of hearing parents in two cultures, an American and a Chinese culture, that differ in their child-rearing practices and in the way gesture is used in relation to speech. The spontaneous sign systems developed in these cultures shared a number of structural similarities: patterned production and deletion of semantic elements in the surface structure of a sentence; patterned ordering of those elements within the sentence; and concatenation of propositions within a sentence. These striking similarities offer critical empirical input towards resolving the ongoing debate about the 'innateness' of language in human infants.
1994
Nature 372:325, 1994
METHODS. Subjects were tested at Base Camp (altitude 5,300 m) before and after a summit climb attempt, at Camp Two (6,300 m), and at Camp Three (7,150 m), within a day of arriving at each location. No supplementary oxygen was used at the testing altitudes. Each subject ...