Language Evolution and Computation Bibliography

Our site (www.isrl.uiuc.edu/amag/langev) retired, please use https://langev.com instead.
Journal :: PloS one
2018
PloS one 13:176-182, 2018
Language, which allows complex ideas to be communicated through symbolic sequences, is a characteristic feature of our species and manifested in a multitude of forms. Using large written corpora for many different languages and scripts, we show that the occurrence probability ...MORE ⇓
Language, which allows complex ideas to be communicated through symbolic sequences, is a characteristic feature of our species and manifested in a multitude of forms. Using large written corpora for many different languages and scripts, we show that the occurrence probability distributions of signs at the left and right ends of words have a distinct heterogeneous nature. Characterizing this asymmetry using quantitative inequality measures, viz. information entropy and the Gini index, we show that the beginning of a word is less restrictive in sign usage than the end. This property is not simply attributable to the use of common affixes as it is seen even when only word roots are considered. We use the existence of this asymmetry to infer the direction of writing in undeciphered inscriptions that agrees with the archaeological evidence. Unlike traditional investigations of phonotactic constraints which focus on language-specific patterns, our study reveals a property valid across languages and writing systems. As both language and writing are unique aspects of our species, this universal signature may reflect an innate feature of the human cognitive phenomenon.
2017
PloS one 12:272-279, 2017
The novel entitled A Story of the Stone provides us precise details of life and social structure of the 18th century China. Its writing lasted a long duration of about 10 years, in which the author's habit may change significantly. It had been published anonymously up to the ...MORE ⇓
The novel entitled A Story of the Stone provides us precise details of life and social structure of the 18th century China. Its writing lasted a long duration of about 10 years, in which the author's habit may change significantly. It had been published anonymously up to the beginning of the 20th century, which left a mystery of the author's attribution. In the present work we focus our attention on scaling behavior embedded in the sentence series from this novel, hope to find how the ideas are organized from single sentences to the whole text. Especially we are interested in the evolution of scale invariance to monitor the changes of the author's language habit and to find some clues on the author's attribution. The sentence series are separated into a total of 69 non-overlapping segments with a length of 500 sentences each. The correlation dependent balanced estimation of diffusion entropy (cBEDE) is employed to evaluate the scaling behaviors embedded in the short segments. It is found that the total, the part attributed currently to Xueqin Cao (X-part), and the other part attributed to E Gao (E-part), display scale invariance in a large scale up to 103 sentences, while their scaling exponents are almost identical. All the segments behave scale invariant in considerable wide scales, most of which reach one third of the length. In the curve of scaling exponent versus segment number, the X-part has rich patterns with averagely larger values, while the E-part has a U-shape with a significant low bottom. This finding is a new clue to support the attribution of the E-part to E Gao.
PloS one 12:244-254, 2017
Human language is composed of sequences of reusable elements. The origins of the sequential structure of language is a hotly debated topic in evolutionary linguistics. In this paper, we show that sets of sequences with language-like statistical properties can emerge from a ...MORE ⇓
Human language is composed of sequences of reusable elements. The origins of the sequential structure of language is a hotly debated topic in evolutionary linguistics. In this paper, we show that sets of sequences with language-like statistical properties can emerge from a process of cultural evolution under pressure from chunk-based memory constraints. We employ a novel experimental task that is non-linguistic and non-communicative in nature, in which participants are trained on and later asked to recall a set of sequences one-by-one. Recalled sequences from one participant become training data for the next participant. In this way, we simulate cultural evolution in the laboratory. Our results show a cumulative increase in structure, and by comparing this structure to data from existing linguistic corpora, we demonstrate a close parallel between the sets of sequences that emerge in our experiment and those seen in natural language.
2016
PloS one 11(4):e0151138, 2016
The claim that Eskimo languages have words for different types of snow is well-known among the public, but has been greatly exaggerated through popularization and is therefore viewed with skepticism by many scholars of language. Despite the prominence of this claim, to our ...MORE ⇓
The claim that Eskimo languages have words for different types of snow is well-known among the public, but has been greatly exaggerated through popularization and is therefore viewed with skepticism by many scholars of language. Despite the prominence of this claim, to our knowledge the line of reasoning behind it has not been tested broadly across languages. Here, we note that this reasoning is a special case of the more general view that language is shaped by the need for efficient communication, and we empirically test a variant of it against multiple sources of data, including library reference works, Twitter, and large digital collections of linguistic and meteorological data. Consistent with the hypothesis of efficient communication, we find that languages that use the same linguistic form for snow and ice tend to be spoken in warmer climates, and that this association appears to be mediated by lower communicative need to talk about snow and ice. Our results confirm that variation in semantic categories across languages may be traceable in part to local communicative needs. They suggest moreover that despite its awkward history, the topic of "words for snow" may play a useful role as an accessible instance of the principle that language supports efficient communication.
PloS one 11:803-821, 2016
Despite being a paradigm of quantitative linguistics, Zipf's law for words suffers from three main problems: its formulation is ambiguous, its validity has not been tested rigorously from a statistical point of view, and it has not been confronted to a representatively large ...MORE ⇓
Despite being a paradigm of quantitative linguistics, Zipf's law for words suffers from three main problems: its formulation is ambiguous, its validity has not been tested rigorously from a statistical point of view, and it has not been confronted to a representatively large number of texts. So, we can summarize the current support of Zipf's law in texts as anecdotic. We try to solve these issues by studying three different versions of Zipf's law and fitting them to all available English texts in the Project Gutenberg database (consisting of more than 30 000 texts). To do so we use state-of-the art tools in fitting and goodness-of-fit tests, carefully tailored to the peculiarities of text statistics. Remarkably, one of the three versions of Zipf's law, consisting of a pure power-law form in the complementary cumulative distribution function of word frequencies, is able to fit more than 40% of the texts in the database (at the 0.05 significance level), for the whole domain of frequencies (from 1 to the maximum value), and with only one free parameter (the exponent).
2015
PloS one 10:1-64, 2015
Communicative interactions involve a kind of procedural knowledge that is used by the human brain for processing verbal and nonverbal inputs and for language production. Although considerable work has been done on modeling human language abilities, it has been difficult to bring ...MORE ⇓
Communicative interactions involve a kind of procedural knowledge that is used by the human brain for processing verbal and nonverbal inputs and for language production. Although considerable work has been done on modeling human language abilities, it has been difficult to bring them together to a comprehensive tabula rasa system compatible with current knowledge of how verbal information is processed in the brain. This work presents a cognitive system, entirely based on a large-scale neural architecture, which was developed to shed light on the procedural knowledge involved in language elaboration. The main component of this system is the central executive, which is a supervising system that coordinates the other components of the working memory. In our model, the central executive is a neural network that takes as input the neural activation states of the short-term memory and yields as output mental actions, which control the flow of information among the working memory components through neural gating mechanisms. The proposed system is capable of learning to communicate through natural language starting from tabula rasa, without any a priori knowledge of the structure of phrases, meaning of words, role of the different classes of words, only by interacting with a human through a text-based interface, using an open-ended incremental learning process. It is able to learn nouns, verbs, adjectives, pronouns and other word classes, and to use them in expressive language. The model was validated on a corpus of 1587 input sentences, based on literature on early language assessment, at the level of about 4-years old child, and produced 521 output sentences, expressing a broad range of language processing functionalities.
PloS one 10:299-345, 2015
Explaining the diversity of languages across the world is one of the central aims of typological, historical, and evolutionary linguistics. We consider the effect of language contact-the number of non-native speakers a language has-on the way languages change and evolve. By ...MORE ⇓
Explaining the diversity of languages across the world is one of the central aims of typological, historical, and evolutionary linguistics. We consider the effect of language contact-the number of non-native speakers a language has-on the way languages change and evolve. By analysing hundreds of languages within and across language families, regions, and text types, we show that languages with greater levels of contact typically employ fewer word forms to encode the same information content (a property we refer to as lexical diversity). Based on three types of statistical analyses, we demonstrate that this variance can in part be explained by the impact of non-native speakers on information encoding strategies. Finally, we argue that languages are information encoding systems shaped by the varying needs of their speakers. Language evolution and change should be modeled as the co-evolution of multiple intertwined adaptive systems: On one hand, the structure of human societies and human learning capabilities, and on the other, the structure of language.
PloS one 10:786-787, 2015
Phylogenetic models, originally developed to demonstrate evolutionary biology, have been applied to a wide range of cultural data including natural language lexicons, manuscripts, folktales, material cultures, and religions. A fundamental question regarding the application of ...MORE ⇓
Phylogenetic models, originally developed to demonstrate evolutionary biology, have been applied to a wide range of cultural data including natural language lexicons, manuscripts, folktales, material cultures, and religions. A fundamental question regarding the application of phylogenetic inference is whether trees are an appropriate approximation of cultural evolutionary history. Their validity in cultural applications has been scrutinized, particularly with respect to the lexicons of dialects in contact. Phylogenetic models organize evolutionary data into a series of branching events through time. However, branching events are typically not included in dialectological studies to interpret the distributions of lexical terms. Instead, dialectologists have offered spatial interpretations to represent lexical data. For example, new lexical items that emerge in a politico-cultural center are likely to spread to peripheries, but not vice versa. To explore the question of the tree model's validity, we present a simple simulation model in which dialects form a spatial network and share lexical items through contact rather than through common ancestors. We input several network topologies to the model to generate synthetic data. We then analyze the synthesized data using conventional phylogenetic techniques. We found that a group of dialects can be considered tree-like even if it has not evolved in a temporally tree-like manner but has a temporally invariant, spatially tree-like structure. In addition, the simulation experiments appear to reproduce unnatural results observed in reconstructed trees for real data. These results motivate further investigation into the spatial structure of the evolutionary history of dialect lexicons as well as other cultural characteristics.
PloS one 10:356-13, 2015
Memory is essential to many cognitive tasks including language. Apart from empirical studies of memory effects on language acquisition and use, there lack sufficient evolutionary explorations on whether a high level of memory capacity is prerequisite for language and whether ...MORE ⇓
Memory is essential to many cognitive tasks including language. Apart from empirical studies of memory effects on language acquisition and use, there lack sufficient evolutionary explorations on whether a high level of memory capacity is prerequisite for language and whether language origin could influence memory capacity. In line with evolutionary theories that natural selection refined language-related cognitive abilities, we advocated a coevolution scenario between language and memory capacity, which incorporated the genetic transmission of individual memory capacity, cultural transmission of idiolects, and natural and cultural selections on individual reproduction and language teaching. To illustrate the coevolution dynamics, we adopted a multi-agent computational model simulating the emergence of lexical items and simple syntax through iterated communications. Simulations showed that: along with the origin of a communal language, an initially-low memory capacity for acquired linguistic knowledge was boosted; and such coherent increase in linguistic understandability and memory capacities reflected a language-memory coevolution; and such coevolution stopped till memory capacities became sufficient for language communications. Statistical analyses revealed that the coevolution was realized mainly by natural selection based on individual communicative success in cultural transmissions. This work elaborated the biology-culture parallelism of language evolution, demonstrated the driving force of culturally-constituted factors for natural selection of individual cognitive abilities, and suggested that the degree difference in language-related cognitive abilities between humans and nonhuman animals could result from a coevolution with language.
PloS one 10:489-509, 2015
Language universals have long been attributed to an innate Universal Grammar. An alternative explanation states that linguistic universals emerged independently in every language in response to shared cognitive or perceptual biases. A computational model has recently shown how ...MORE ⇓
Language universals have long been attributed to an innate Universal Grammar. An alternative explanation states that linguistic universals emerged independently in every language in response to shared cognitive or perceptual biases. A computational model has recently shown how this could be the case, focusing on the paradigmatic example of the universal properties of colour naming patterns, and producing results in quantitative agreement with the experimental data. Here we investigate the role of an individual perceptual bias in the framework of the model. We study how, and to what extent, the structure of the bias influences the corresponding linguistic universal patterns. We show that the cultural history of a group of speakers introduces population-specific constraints that act against the pressure for uniformity arising from the individual bias, and we clarify the interplay between these two forces.
PloS one 10, 2015
How communication systems emerge is a topic of relevance to several academic disciplines. Numerous existing models, both mathematical and computational, study this emergence. However, with few exceptions, these models all build some form of communication into their initial ...MORE ⇓
How communication systems emerge is a topic of relevance to several academic disciplines. Numerous existing models, both mathematical and computational, study this emergence. However, with few exceptions, these models all build some form of communication into their initial specification. Consequently, what these models study is how communication systems transition from one form to another, and not how communication itself emerges in the first place. Here we present a new computational model of the emergence of communication which, unlike previous models, does not pre-specify the existence of communication. We conduct two experiments using this model, in order to derive general statements about how communication systems emerge. The two main routes to communication that we identify correspond with findings from the empirical literature on the evolution of animal signals. We use this finding to explain when and why we should expect communication to emerge in nature. We also compare our model to experimental research on the origins of human communication systems, and hence show that humans are an important exception to the general trends we observe. We argue that this is because humans, and probably only humans, are able to ‘signal signalhood’, i.e. to express communicative intentions.
2014
PloS one 9:839-862, 2014
Human languages are rule governed, but almost invariably these rules have exceptions in the form of irregularities. Since rules in language are efficient and productive, the persistence of irregularity is an anomaly. How does irregularity linger in the face of internal ...MORE ⇓
Human languages are rule governed, but almost invariably these rules have exceptions in the form of irregularities. Since rules in language are efficient and productive, the persistence of irregularity is an anomaly. How does irregularity linger in the face of internal (endogenous) and external (exogenous) pressures to conform to a rule? Here we address this problem by taking a detailed look at simple past tense verbs in the Corpus of Historical American English. The data show that the language is open, with many new verbs entering. At the same time, existing verbs might tend to regularize or irregularize as a consequence of internal dynamics, but overall, the amount of irregularity sustained by the language stays roughly constant over time. Despite continuous vocabulary growth, and presumably, an attendant increase in expressive power, there is no corresponding growth in irregularity. We analyze the set of irregulars, showing they may adhere to a set of minority rules, allowing for increased stability of irregularity over time. These findings contribute to the debate on how language systems become rule governed, and how and why they sustain exceptions to rules, providing insight into the interplay between the emergence and maintenance of rules and exceptions in language.
2013
PloS one 8:495-498, 2013
It is well-known that word frequencies arrange themselves according to Zipf's law. However, little is known about the dependency of the parameters of the law and the complexity of a communication system. Many models of the evolution of language assume that the exponent of the law ...MORE ⇓
It is well-known that word frequencies arrange themselves according to Zipf's law. However, little is known about the dependency of the parameters of the law and the complexity of a communication system. Many models of the evolution of language assume that the exponent of the law remains constant as the complexity of a communication systems increases. Using longitudinal studies of child language, we analysed the word rank distribution for the speech of children and adults participating in conversations. The adults typically included family members (e.g., parents) or the investigators conducting the research. Our analysis of the evolution of Zipf's law yields two main unexpected results. First, in children the exponent of the law tends to decrease over time while this tendency is weaker in adults, thus suggesting this is not a mere mirror effect of adult speech. Second, although the exponent of the law is more stable in adults, their exponents fall below 1 which is the typical value of the exponent assumed in both children and adults. Our analysis also shows a tendency of the mean length of utterances (MLU), a simple estimate of syntactic complexity, to increase as the exponent decreases. The parallel evolution of the exponent and a simple indicator of syntactic complexity (MLU) supports the hypothesis that the exponent of Zipf's law and linguistic complexity are inter-related. The assumption that Zipf's law for word ranks is a power-law with a constant exponent of one in both adults and children needs to be revised.
PloS one 8:300-307, 2013
This study examines the intergenerational transfer of human communication systems. It tests if human communication systems evolve to be easy to learn or easy to use (or both), and how population size affects learnability and usability. Using an experimental-semiotic task, we find ...MORE ⇓
This study examines the intergenerational transfer of human communication systems. It tests if human communication systems evolve to be easy to learn or easy to use (or both), and how population size affects learnability and usability. Using an experimental-semiotic task, we find that human communication systems evolve to be easier to use (production efficiency and reproduction fidelity), but harder to learn (identification accuracy) for a second generation of naïve participants. Thus, usability trumps learnability. In addition, the communication systems that evolve in larger populations exhibit distinct advantages over those that evolve in smaller populations: the learnability loss (from the Initial signs) is more muted and the usability benefits are more pronounced. The usability benefits for human communication systems that evolve in a small and large population is explained through guided variation reducing sign complexity. The enhanced performance of the communication systems that evolve in larger populations is explained by the operation of a content bias acting on the larger pool of competing signs. The content bias selects for information-efficient iconic signs that aid learnability and enhance usability.
PLoS ONE 8(1):e52742, 2013
We propose a simple model for genetic adaptation to a changing environment, describing a fitness landscape characterized by two maxima. One is associated with “specialist” individuals that are adapted to the environment; this maximum moves over time as the environment changes. ...MORE ⇓
We propose a simple model for genetic adaptation to a changing environment, describing a fitness landscape characterized by two maxima. One is associated with “specialist” individuals that are adapted to the environment; this maximum moves over time as the environment changes. The other maximum is static, and represents “generalist” individuals not affected by environmental changes. The rest of the landscape is occupied by “maladapted” individuals. Our analysis considers the evolution of these three subpopulations. Our main result is that, in presence of a sufficiently stable environmental feature, as in the case of an unchanging aspect of a physical habitat, specialists can dominate the population. By contrast, rapidly changing environmental features, such as language or cultural habits, are a moving target for the genes; here, generalists dominate, because the best evolutionary strategy is to adopt neutral alleles not specialized for any specific environment. The model we propose is based on simple assumptions about evolutionary dynamics and describes all possible scenarios in a non-trivial phase diagram. The approach provides a general framework to address such fundamental issues as the Baldwin effect, the biological basis for language, or the ecological consequences of a rapid climate change.
PLoS ONE 8(1):e55009, 2013
Understanding the patterns and causes of differential structural stability is an area of major interest for the study of language change and evolution. It is still debated whether structural features have intrinsic stabilities across language families and geographic areas, or if ...MORE ⇓
Understanding the patterns and causes of differential structural stability is an area of major interest for the study of language change and evolution. It is still debated whether structural features have intrinsic stabilities across language families and geographic areas, or if the processes governing their rate of change are completely dependent upon the specific context of a given language or language family. We conducted an extensive literature review and selected seven different approaches to conceptualising and estimating the stability of structural linguistic features, aiming at comparing them using the same dataset, the World Atlas of Language Structures. We found that, despite profound conceptual and empirical differences between these methods, they tend to agree in classifying some structural linguistic features as being more stable than others. This suggests that there are intrinsic properties of such structural features influencing their stability across methods, language families and geographic areas. This finding is a major step towards understanding the nature of structural linguistic features and their interaction with idiosyncratic, lineage- and area-specific factors during language change and evolution.
PLoS ONE 8(2):e56230, 2013
Our goal of this study is to characterize the functions of language areas in most precise terms. Previous neuroimaging studies have reported that more complex sentences elicit larger activations in the left inferior frontal gyrus (L. F3op/F3t), although the most critical factor ...MORE ⇓
Our goal of this study is to characterize the functions of language areas in most precise terms. Previous neuroimaging studies have reported that more complex sentences elicit larger activations in the left inferior frontal gyrus (L. F3op/F3t), although the most critical factor still remains to be identified. We hypothesize that pseudowords with grammatical particles and morphosyntactic information alone impose a construction of syntactic structures, just like normal sentences, and that “the Degree of Merger” (DoM) in recursively merged sentences parametrically modulates neural activations. Using jabberwocky sentences with distinct constructions, we fitted various parametric models of syntactic, other linguistic, and nonlinguistic factors to activations measured with functional magnetic resonance imaging. We demonstrated that the models of DoM and “DoM+number of Search (searching syntactic features)” were the best to explain activations in the L. F3op/F3t and supramarginal gyrus (L. SMG), respectively. We further introduced letter strings, which had neither lexical associations nor grammatical particles, but retained both matching orders and symbol orders of sentences. By directly contrasting jabberwocky sentences with letter strings, localized activations in L. F3op/F3t and L. SMG were indeed independent of matching orders and symbol orders. Moreover, by using dynamic causal modeling, we found that the model with a inhibitory modulatory effect for the bottom-up connectivity from L. SMG to L. F3op/F3t was the best one. For this best model, the top-down connection from L. F3op/F3t to L. SMG was significantly positive. By using diffusion-tensor imaging, we confirmed that the left dorsal pathway of the superior longitudinal and arcuate fasciculi consistently connected these regions. Lastly, we established that nonlinguistic order-related and error-related factors significantly activated the right (R.) lateral premotor cortex and R. F3op/F3t, respectively. These results indicate that the identified network of L. F3op/F3t and L. SMG subserves the calculation of DoM in recursively merged sentences.
PLoS ONE 8(3):e58960, 2013
Grammatical agreement means that features associated with one linguistic unit (for example number or gender) become associated with another unit and then possibly overtly expressed, typically with morphological markers. It is one of the key mechanisms used in many languages to ...MORE ⇓
Grammatical agreement means that features associated with one linguistic unit (for example number or gender) become associated with another unit and then possibly overtly expressed, typically with morphological markers. It is one of the key mechanisms used in many languages to show that certain linguistic units within an utterance grammatically depend on each other. Agreement systems are puzzling because they can be highly complex in terms of what features they use and how they are expressed. Moreover, agreement systems have undergone considerable change in the historical evolution of languages. This article presents language game models with populations of agents in order to find out for what reasons and by what cultural processes and cognitive strategies agreement systems arise. It demonstrates that agreement systems are motivated by the need to minimize combinatorial search and semantic ambiguity, and it shows, for the first time, that once a population of agents adopts a strategy to invent, acquire and coordinate meaningful markers through social learning, linguistic self-organization leads to the spontaneous emergence and cultural transmission of an agreement system. The article also demonstrates how attested grammaticalization phenomena, such as phonetic reduction and conventionalized use of agreement markers, happens as a side effect of additional economizing principles, in particular minimization of articulatory effort and reduction of the marker inventory. More generally, the article illustrates a novel approach for studying how key features of human languages might emerge.
PLoS ONE 8(4):e62243, 2013
Languages evolve over space and time. Illuminating the evolutionary history of language is important because it provides a unique opportunity to shed light on the population history of the speakers. Spatial and temporal aspects of language evolution are particularly crucial for ...MORE ⇓
Languages evolve over space and time. Illuminating the evolutionary history of language is important because it provides a unique opportunity to shed light on the population history of the speakers. Spatial and temporal aspects of language evolution are particularly crucial for understanding demographic history, as they allow us to identify when and where the languages originated, as well as how they spread across the globe. Here we apply Bayesian phylogeographic methods to reconstruct spatiotemporal evolution of the Ainu language: an endangered language spoken by an indigenous group that once thrived in northern Japan. The conventional dual-structure model has long argued that modern Ainu are direct descendants of a single, Pleistocene human lineage from Southeast Asia, namely the Jomon people. In contrast, recent evidence from archaeological, anthropological and genetic evidence suggest that the Ainu are an outcome of significant genetic and cultural contributions from Siberian hunter-gatherers, the Okhotsk, who migrated into northern Hokkaido around 900–1600 years ago. Estimating from 19 Ainu language varieties preserved five decades ago, our analysis shows that they are descendants of a common ancestor who spread from northern Hokkaido around 1300 years ago. In addition to several lines of emerging evidence, our phylogeographic analysis strongly supports the hypothesis that recent expansion of the Okhotsk to northern Hokkaido had a profound impact on the origins of the Ainu people and their culture, and hence calls for a refinement to the dual-structure model.
PLoS ONE 8(5):e63238, 2013
The ASJP (Automated Similarity Judgment Program) described an automated, lexical similarity-based method for dating the world’s language groups using 52 archaeological, epigraphic and historical calibration date points. The present paper describes a new automated dating method, ...MORE ⇓
The ASJP (Automated Similarity Judgment Program) described an automated, lexical similarity-based method for dating the world’s language groups using 52 archaeological, epigraphic and historical calibration date points. The present paper describes a new automated dating method, based on phonotactic diversity. Unlike ASJP, our method does not require any information on the internal classification of a language group. Also, the method can use all the available word lists for a language and its dialects eschewing the debate on ‘language’ vs. ‘dialect’. We further combine these dates and provide a new baseline which, to our knowledge, is the best one. We make a systematic comparison of our method, ASJP’s dating procedure, and combined dates. We predict time depths for world’s language families and sub-families using this new baseline. Finally, we explain our results in the model of language change given by Nettle.
2012
PLoS ONE 7(3):e33171, 2012
Language change takes place primarily via diffusion of linguistic variants in a population of individuals. Identifying selective pressures on this process is important not only to construe and predict changes, but also to inform theories of evolutionary dynamics of socio-cultural ...MORE ⇓
Language change takes place primarily via diffusion of linguistic variants in a population of individuals. Identifying selective pressures on this process is important not only to construe and predict changes, but also to inform theories of evolutionary dynamics of socio-cultural factors. In this paper, we advocate the Price equation from evolutionary biology and the Polya-urn dynamics from contagion studies as efficient ways to discover selective pressures. Using the Price equation to process the simulation results of a computer model that follows the Polya-urn dynamics, we analyze theoretically a variety of factors that could affect language change, including variant prestige, transmission error, individual influence and preference, and social structure. Among these factors, variant prestige is identified as the sole selective pressure, whereas others help modulate the degree of diffusion only if variant prestige is involved. This multidisciplinary study discerns the primary and complementary roles of linguistic, individual learning, and socio-cultural factors in language change, and offers insight into empirical studies of language change.
PLoS ONE 7(4):e35025, 2012

Background

Recent advances in automated assessment of basic vocabulary lists allow the construction of linguistic phylogenies useful for tracing dynamics of human population expansions, reconstructing ancestral cultures, and modeling transition rates of cultural traits ...MORE ⇓

Background

Recent advances in automated assessment of basic vocabulary lists allow the construction of linguistic phylogenies useful for tracing dynamics of human population expansions, reconstructing ancestral cultures, and modeling transition rates of cultural traits over time.

Methods

Here we investigate the Tupi expansion, a widely-dispersed language family in lowland South America, with a distance-based phylogeny based on 40-word vocabulary lists from 48 languages. We coded 11 cultural traits across the diverse Tupi family including traditional warfare patterns, post-marital residence, corporate structure, community size, paternity beliefs, sibling terminology, presence of canoes, tattooing, shamanism, men's houses, and lip plugs.

Results/Discussion

The linguistic phylogeny supports a Tupi homeland in west-central Brazil with subsequent major expansions across much of lowland South America. Consistently, ancestral reconstructions of cultural traits over the linguistic phylogeny suggest that social complexity has tended to decline through time, most notably in the independent emergence of several nomadic hunter-gatherer societies. Estimated rates of cultural change across the Tupi expansion are on the order of only a few changes per 10,000 years, in accord with previous cultural phylogenetic results in other language families around the world, and indicate a conservative nature to much of human culture.
PLoS ONE 7(4):e35289, 2012
Language is a key adaptation of our species, yet we do not know when it evolved. Here, we use data on language phonemic diversity to estimate a minimum date for the origin of language. We take advantage of the fact that phonemic diversity evolves slowly and use it as a clock to ...MORE ⇓
Language is a key adaptation of our species, yet we do not know when it evolved. Here, we use data on language phonemic diversity to estimate a minimum date for the origin of language. We take advantage of the fact that phonemic diversity evolves slowly and use it as a clock to calculate how long the oldest African languages would have to have been around in order to accumulate the number of phonemes they possess today. We use a natural experiment, the colonization of Southeast Asia and Andaman Islands, to estimate the rate at which phonemic diversity increases through time. Using this rate, we estimate that present-day languages date back to the Middle Stone Age in Africa. Our analysis is consistent with the archaeological evidence suggesting that complex human behavior evolved during the Middle Stone Age in Africa, and does not support the view that language is a recent adaptation that has sparked the dispersal of humans out of Africa. While some of our assumptions require testing and our results rely at present on a single case-study, our analysis constitutes the first estimate of when language evolved that is directly based on linguistic data.
Naming a Structured World: A Cultural Route to Duality of PatterningPDF
PLoS ONE 7(6):e37744, 2012
The lexicons of human languages organize their units at two distinct levels. At a first combinatorial level, meaningless forms (typically referred to as phonemes) are combined into meaningful units (typically referred to as morphemes). Thanks to this, many ...
PLoS ONE 7(6):e38236, 2012
The advent of humanoid robots has enabled a new approach to investigating the acquisition of language, and we report on the development of robots able to acquire rudimentary linguistic skills. Our work focuses on early stages analogous to some characteristics of a human child of ...MORE ⇓
The advent of humanoid robots has enabled a new approach to investigating the acquisition of language, and we report on the development of robots able to acquire rudimentary linguistic skills. Our work focuses on early stages analogous to some characteristics of a human child of about 6 to 14 months, the transition from babbling to first word forms. We investigate one mechanism among many that may contribute to this process, a key factor being the sensitivity of learners to the statistical distribution of linguistic elements. As well as being necessary for learning word meanings, the acquisition of anchor word forms facilitates the segmentation of an acoustic stream through other mechanisms. In our experiments some salient one-syllable word forms are learnt by a humanoid robot in real-time interactions with naive participants. Words emerge from random syllabic babble through a learning process based on a dialogue between the robot and the human participant, whose speech is perceived by the robot as a stream of phonemes. Numerous ways of representing the speech as syllabic segments are possible. Furthermore, the pronunciation of many words in spontaneous speech is variable. However, in line with research elsewhere, we observe that salient content words are more likely than function words to have consistent canonical representations; thus their relative frequency increases, as does their influence on the learner. Variable pronunciation may contribute to early word form acquisition. The importance of contingent interaction in real-time between teacher and learner is reflected by a reinforcement process, with variable success. The examination of individual cases may be more informative than group results. Nevertheless, word forms are usually produced by the robot after a few minutes of dialogue, employing a simple, real-time, frequency dependent mechanism. This work shows the potential of human-robot interaction systems in studies of the dynamics of early language acquisition.
PLoS ONE 7(7):e40137, 2012
Many patterns displayed by the distribution of human linguistic groups are similar to the ecological organization described for biological species. It remains a challenge to identify simple and meaningful processes that describe these patterns. The population size distribution of ...MORE ⇓
Many patterns displayed by the distribution of human linguistic groups are similar to the ecological organization described for biological species. It remains a challenge to identify simple and meaningful processes that describe these patterns. The population size distribution of human linguistic groups, for example, is well fitted by a log-normal distribution that may arise from stochastic demographic processes. As we show in this contribution, the distribution of the area size of home ranges of those groups also agrees with a log-normal function. Further, size and area are significantly correlated: the number of speakers and the area spanned by linguistic groups follow the allometric relation , with an exponent varying accross different world regions. The empirical evidence presented leads to the hypothesis that the distributions of and , and their mutual dependence, rely on demographic dynamics and on the result of conflicts over territory due to group growth. To substantiate this point, we introduce a two-variable stochastic multiplicative model whose analytical solution recovers the empirical observations. Applied to different world regions, the model reveals that the retreat in home range is sublinear with respect to the decrease in population size, and that the population-area exponent grows with the typical strength of conflicts. While the shape of the population size and area distributions, and their allometric relation, seem unavoidable outcomes of demography and inter-group contact, the precise value of could give insight on the cultural organization of those human groups in the last thousand years.
PLoS ONE 7(8):e43807, 2012
Previous studies have shown that iconic graphical signs can evolve into symbols through repeated usage within dyads and interacting communities. Here we investigate the evolution of graphical signs over chains of participants. In these chains (or “replacement ...
PLoS ONE 7(9):e45198, 2012
Language is the best example of a cultural evolutionary system, able to retain a phylogenetic signal over many thousands of years. The temporal stability (conservatism) of basic vocabulary is relatively well understood, but the stability of the structural properties of language ...MORE ⇓
Language is the best example of a cultural evolutionary system, able to retain a phylogenetic signal over many thousands of years. The temporal stability (conservatism) of basic vocabulary is relatively well understood, but the stability of the structural properties of language (phonology, morphology, syntax) is still unclear. Here we report an extensive Bayesian phylogenetic investigation of the structural stability of numerous features across many language families and we introduce a novel method for analyzing the relationships between the stability profiles of language families. We found that there is a strong universal component across language families, suggesting the existence of universal linguistic, cognitive and genetic constraints. Against this background, however, each language family has a distinct stability profile, and these profiles cluster by geographic area and likely deep genealogical relationships. These stability profiles seem to show, for example, the ancient historical relationships between the Siberian and American language families, presumed to be separated by at least 12,000 years, and possible connections between the Eurasian families. We also found preliminary support for the punctuated evolution of structural features of language across families, types of features and geographic areas. Thus, such higher-level properties of language seen as an evolutionary system might allow the investigation of ancient connections between languages and shed light on the peopling of the world.
PLoS ONE 7(10):e48029, 2012
In contrast with animal communication systems, diversity is characteristic of almost every aspect of human language. Languages variously employ tones, clicks, or manual signs to signal differences in meaning; some languages lack the noun-verb distinction (e.g., Straits Salish), ...MORE ⇓
In contrast with animal communication systems, diversity is characteristic of almost every aspect of human language. Languages variously employ tones, clicks, or manual signs to signal differences in meaning; some languages lack the noun-verb distinction (e.g., Straits Salish), whereas others have a proliferation of fine-grained syntactic categories (e.g., Tzeltal); and some languages do without morphology (e.g., Mandarin), while others pack a whole sentence into a single word (e.g., Cayuga). A challenge for evolutionary biology is to reconcile the diversity of languages with the high degree of biological uniformity of their speakers. Here, we model processes of language change and geographical dispersion and find a consistent pressure for flexible learning, irrespective of the language being spoken. This pressure arises because flexible learners can best cope with the observed high rates of linguistic change associated with divergent cultural evolution following human migration. Thus, rather than genetic adaptations for specific aspects of language, such as recursion, the coevolution of genes and fast-changing linguistic structure provides the biological basis for linguistic diversity. Only biological adaptations for flexible learning combined with cultural evolution can explain how each child has the potential to learn any human language.
PLoS ONE 7(12):e52064, 2012
The warp ikat method of making decorated textiles is one of the most geographically widespread in southeast Asia, being used by Austronesian peoples in Indonesia, Malaysia and the Philippines, and Daic peoples on the Asian mainland. In this study a dataset consisting of the ...MORE ⇓
The warp ikat method of making decorated textiles is one of the most geographically widespread in southeast Asia, being used by Austronesian peoples in Indonesia, Malaysia and the Philippines, and Daic peoples on the Asian mainland. In this study a dataset consisting of the decorative characters of 36 of these warp ikat weaving traditions is investigated using Bayesian and Neighbornet techniques, and the results are used to construct a phylogenetic tree and taxonomy for warp ikat weaving in southeast Asia. The results and analysis show that these diverse traditions have a common ancestor amongst neolithic cultures the Asian mainland, and parallels exist between the patterns of textile weaving descent and linguistic phylogeny for the Austronesian group. Ancestral state analysis is used to reconstruct some of the features of the ancestral weaving tradition. The widely held theory that weaving motifs originated in the late Bronze Age Dong-Son culture is shown to be inconsistent with the data.
2011
PLoS ONE 6(2):e16677, 2011
Human languages evolve continuously, and a puzzling problem is how to reconcile the apparent robustness of most of the deep linguistic structures we use with the evidence that they undergo possibly slow, yet ceaseless, changes. Is the state in which we observe languages today ...MORE ⇓
Human languages evolve continuously, and a puzzling problem is how to reconcile the apparent robustness of most of the deep linguistic structures we use with the evidence that they undergo possibly slow, yet ceaseless, changes. Is the state in which we observe languages today closer to what would be a dynamical attractor with statistically stationary properties or rather closer to a non-steady state slowly evolving in time? Here we address this question in the framework of the emergence of shared linguistic categories in a population of individuals interacting through language games. The observed emerging asymptotic categorization, which has been previously tested - with success - against experimental data from human languages, corresponds to a metastable state where global shifts are always possible but progressively more unlikely and the response properties depend on the age of the system. This aging mechanism exhibits striking quantitative analogies to what is observed in the statistical mechanics of glassy systems. We argue that this can be a general scenario in language dynamics where shared linguistic conventions would not emerge as attractors, but rather as metastable states.
PLoS ONE 6(2):e17333, 2011
In this paper we extract the topology of the semantic space in its encyclopedic acception, measuring the semantic flow between the different entries of the largest modern encyclopedia, Wikipedia, and thus creating a directed complex network of semantic flows. ...
PLoS ONE 6(4):e14810, 2011
Background Archaeologists and anthropologists have long recognized that different cultural complexes may have distinct descent histories, but they have lacked analytical techniques capable of easily identifying such incongruence. Here, we show how Bayesian phylogenetic analysis can be used ...MORE ⇓
Background Archaeologists and anthropologists have long recognized that different cultural complexes may have distinct descent histories, but they have lacked analytical techniques capable of easily identifying such incongruence. Here, we show how Bayesian phylogenetic analysis can be used to identify incongruent cultural histories. We employ the approach to investigate Iranian tribal textile traditions. Methods We used Bayes factor comparisons in a phylogenetic framework to test two models of cultural evolution: the hierarchically integrated system hypothesis and the multiple coherent units hypothesis. In the hierarchically integrated system hypothesis, a core tradition of characters evolves through descent with modification and characters peripheral to the core are exchanged among contemporaneous populations. In the multiple coherent units hypothesis, a core tradition does not exist. Rather, there are several cultural units consisting of sets of characters that have different histories of descent. Results For the Iranian textiles, the Bayesian phylogenetic analyses supported the multiple coherent units hypothesis over the hierarchically integrated system hypothesis. Our analyses suggest that pile-weave designs represent a distinct cultural unit that has a different phylogenetic history compared to other textile characters. Conclusions The results from the Iranian textiles are consistent with the available ethnographic evidence, which suggests that the commercial rug market has influenced pile-rug designs but not the techniques or designs incorporated in the other textiles produced by the tribes. We anticipate that Bayesian phylogenetic tests for inferring cultural units will be of great value for researchers interested in studying the evolution of cultural traits including language, behavior, and material culture.
PLoS ONE 6(4):e18852, 2011
The evolutionary origin of human language and its neurobiological foundations has long been the object of intense scientific debate. Although a number of theories have been proposed, one particularly contentious model suggests that human language evolved from a manual gestural ...MORE ⇓
The evolutionary origin of human language and its neurobiological foundations has long been the object of intense scientific debate. Although a number of theories have been proposed, one particularly contentious model suggests that human language evolved from a manual gestural communication system in a common ape-human ancestor. Consistent with a gestural origins theory are data indicating that chimpanzees intentionally and referentially communicate via manual gestures, and the production of manual gestures, in conjunction with vocalizations, activates the chimpanzee Broca's area homologue a region in the human brain that is critical for the planning and execution of language. However, it is not known if this activity observed in the chimpanzee Broca's area is the result of the chimpanzees producing manual communicative gestures, communicative sounds, or both. This information is critical for evaluating the theory that human language evolved from a strictly manual gestural system. To this end, we used positron emission tomography (PET) to examine the neural metabolic activity in the chimpanzee brain. We collected PET data in 4 subjects, all of whom produced manual communicative gestures. However, 2 of these subjects also produced so-called attention-getting vocalizations directed towards a human experimenter. Interestingly, only the two subjects that produced these attention-getting sounds showed greater mean metabolic activity in the Broca's area homologue as compared to a baseline scan. The two subjects that did not produce attention-getting sounds did not. These data contradict an exclusive gestural origins theory for they suggest that it is vocal signaling that selectively activates the Broca's area homologue in chimpanzees. In other words, the activity observed in the Broca's area homologue reflects the production of vocal signals by the chimpanzees, suggesting thast this critical human language region was involved in vocal signaling in the common ancestor of both modern humans and chimpanzees.
PLoS ONE 6(5):e19009, 2011
Patterns of word use both reflect and influence a myriad of human activities and interactions. Like other entities that are reproduced and evolve, words rise or decline depending upon a complex interplay between their intrinsic properties and the environments in which they ...MORE ⇓
Patterns of word use both reflect and influence a myriad of human activities and interactions. Like other entities that are reproduced and evolve, words rise or decline depending upon a complex interplay between their intrinsic properties and the environments in which they function. Using Internet discussion communities as model systems, we define the concept of a word niche as the relationship between the word and the characteristic features of the environments in which it is used. We develop a method to quantify two important aspects of the size of the word niche: the range of individuals using the word and the range of topics it is used to discuss. Controlling for word frequency, we show that these aspects of the word niche are strong determinants of changes in word frequency. Previous studies have already indicated that word frequency itself is a correlate of word success at historical time scales. Our analysis of changes in word frequencies over time reveals that the relative sizes of word niches are far more important than word frequencies in the dynamics of the entire vocabulary at shorter time scales, as the language adapts to new concepts and social groupings. We also distinguish endogenous versus exogenous factors as additional contributors to the fates of words, and demonstrate the force of this distinction in the rise of novel words. Our results indicate that short-term nonstationarity in word statistics is strongly driven by individual proclivities, including inclinations to provide novel information and to project a distinctive social identity.
PLoS ONE 6(5):e19875, 2011
Background The language faculty is probably the most distinctive feature of our species, and endows us with a unique ability to exchange highly structured information. In written language, information is encoded by the concatenation of basic symbols under ...
PLoS ONE 6(6):e20109, 2011
Historical linguistics aims at inferring the most likely language phylogenetic tree starting from information concerning the evolutionary relatedness of languages. The available information are typically lists of homologous (lexical, phonological, syntactic) features or ...MORE ⇓
Historical linguistics aims at inferring the most likely language phylogenetic tree starting from information concerning the evolutionary relatedness of languages. The available information are typically lists of homologous (lexical, phonological, syntactic) features or characters for many different languages: a set of parallel corpora whose compilation represents a paramount achievement in linguistics.

From this perspective the reconstruction of language trees is an example of inverse problems: starting from present, incomplete and often noisy, information, one aims at inferring the most likely past evolutionary history. A fundamental issue in inverse problems is the evaluation of the inference made. A standard way of dealing with this question is to generate data with artificial models in order to have full access to the evolutionary process one is going to infer. This procedure presents an intrinsic limitation: when dealing with real data sets, one typically does not know which model of evolution is the most suitable for them. A possible way out is to compare algorithmic inference with expert classifications. This is the point of view we take here by conducting a thorough survey of the accuracy of reconstruction methods as compared with the Ethnologue expert classifications. We focus in particular on state-of-the-art distance-based methods for phylogeny reconstruction using worldwide linguistic databases.

In order to assess the accuracy of the inferred trees we introduce and characterize two generalizations of standard definitions of distances between trees. Based on these scores we quantify the relative performances of the distance-based algorithms considered. Further we quantify how the completeness and the coverage of the available databases affect the accuracy of the reconstruction. Finally we draw some conclusions about where the accuracy of the reconstructions in historical linguistics stands and about the leading directions to improve it.

PLoS ONE 6(9):e25195, 2011
In recent years, linguists have begun to increasingly rely on quantitative phylogenetic approaches to examine language evolution. Some linguists have questioned the suitability of phylogenetic approaches on the grounds that linguistic evolution is largely reticulate ...
PLoS ONE 6(11):e26822, 2011
The voluntary control of phonation is a crucial achievement in the evolution of speech. In humans, ventral premotor cortex (PMv) and Broca's area are known to be involved in voluntary phonation. In contrast, no neurophysiological data are available about the role of the ...MORE ⇓
The voluntary control of phonation is a crucial achievement in the evolution of speech. In humans, ventral premotor cortex (PMv) and Broca's area are known to be involved in voluntary phonation. In contrast, no neurophysiological data are available about the role of the oro-facial sector of nonhuman primates PMv in this function. In order to address this issue, we recorded PMv neurons from two monkeys trained to emit coo-calls. Results showed that a population of motor neurons specifically fire during vocalization. About two thirds of them discharged before sound onset, while the remaining were time-locked with it. The response of vocalization-selective neurons was present only during conditioned (voluntary) but not spontaneous (emotional) sound emission. These data suggest that the control of vocal production exerted by PMv neurons constitutes a newly emerging property in the monkey lineage, shedding light on the evolution of phonation-based communication from a nonhuman primate species.
2010
PLoS ONE 5(1):e8559, 2010

Background

Languages differ greatly both in their syntactic and morphological systems and in the social environments in which they exist. We challenge the view that language grammars are unrelated to social environments in which they are learned and used. ...MORE ⇓

Background

Languages differ greatly both in their syntactic and morphological systems and in the social environments in which they exist. We challenge the view that language grammars are unrelated to social environments in which they are learned and used.

Methodology/Principal Findings

We conducted a statistical analysis of >2,000 languages using a combination of demographic sources and the World Atlas of Language Structures a database of structural language properties. We found strong relationships between linguistic factors related to morphological complexity, and demographic/socio-historical factors such as the number of language users, geographic spread, and degree of language contact. The analyses suggest that languages spoken by large groups have simpler inflectional morphology than languages spoken by smaller groups as measured on a variety of factors such as case systems and complexity of conjugations. Additionally, languages spoken by large groups are much more likely to use lexical strategies in place of inflectional morphology to encode evidentiality, negation, aspect, and possession. Our findings indicate that just as biological organisms are shaped by ecological niches, language structures appear to adapt to the environment (niche) in which they are being learned and used. As adults learn a language, features that are difficult for them to acquire, are less likely to be passed on to subsequent learners. Languages used for communication in large groups that include adult learners appear to have been subjected to such selection. Conversely, the morphological complexity common to languages used in small groups increases redundancy which may facilitate language learning by infants.

Conclusions/Significance

We hypothesize that language structures are subjected to different evolutionary pressures in different social environments. Just as biological organisms are shaped by ecological niches, language structures appear to adapt to the environment (niche) in which they are being learned and used. The proposed Linguistic Niche Hypothesis has implications for answering the broad question of why languages differ in the way they do and makes empirical predictions regarding language acquisition capacities of children versus adults.
PLoS ONE 5(1):e8681, 2010
We study the viability and resilience of languages, using a simple dynamical model of two languages in competition. Assuming that public action can modify the prestige of a language in order to avoid language extinction, we analyze two cases: (i) the prestige can only take two ...MORE ⇓
We study the viability and resilience of languages, using a simple dynamical model of two languages in competition. Assuming that public action can modify the prestige of a language in order to avoid language extinction, we analyze two cases: (i) the prestige can only take two values, (ii) it can take any value but its change at each time step is bounded. In both cases, we determine the viability kernel, that is, the set of states for which there exists an action policy maintaining the coexistence of the two languages, and we define such policies. We also study the resilience of the languages and identify configurations from where the system can return to the viability kernel (finite resilience), or where one of the languages is lead to disappear (zero resilience). Within our current framework, the maintenance of a bilingual society is shown to be possible by introducing the prestige of a language as a control variable.
PLoS ONE 5(3):e9411, 2010
Background

Zipf's law states that the relationship between the frequency of a word in a text and its rank (the most frequent word has rank , the 2nd most frequent word has rank) is approximately linear when plotted on a double logarithmic scale. It has been argued that the ...MORE ⇓

Background

Zipf's law states that the relationship between the frequency of a word in a text and its rank (the most frequent word has rank , the 2nd most frequent word has rank) is approximately linear when plotted on a double logarithmic scale. It has been argued that the law is not a relevant or useful property of language because simple random texts - constructed by concatenating random characters including blanks behaving as word delimiters - exhibit a Zipf's law-like word rank distribution.

Methodology/Principal Findings

In this article, we examine the flaws of such putative good fits of random texts. We demonstrate - by means of three different statistical tests - that ranks derived from random texts and ranks derived from real texts are statistically inconsistent with the parameters employed to argue for such a good fit, even when the parameters are inferred from the target real text. Our findings are valid for both the simplest random texts composed of equally likely characters as well as more elaborate and realistic versions where character probabilities are borrowed from a real text.

Conclusions/Significance

The good fit of random texts to real Zipf's law-like rank distributions has not yet been established. Therefore, we suggest that Zipf's law might in fact be a fundamental law in natural languages.

PLoS ONE 5(3):e9573, 2010
We recently used computational phylogenetic methods on lexical data to test between two scenarios for the peopling of the Pacific. Our analyses of lexical data supported a pulse-pause scenario of Pacific settlement in which the Austronesian speakers originated in Taiwan around ...MORE ⇓
We recently used computational phylogenetic methods on lexical data to test between two scenarios for the peopling of the Pacific. Our analyses of lexical data supported a pulse-pause scenario of Pacific settlement in which the Austronesian speakers originated in Taiwan around 5,200 years ago and rapidly spread through the Pacific in a series of expansion pulses and settlement pauses. We claimed that there was high congruence between traditional language subgroups and those observed in the language phylogenies, and that the estimated age of the Austronesian expansion at 5,200 years ago was consistent with the archaeological evidence. However, the congruence between the language phylogenies and the evidence from historical linguistics was not quantitatively assessed using tree comparison metrics. The robustness of the divergence time estimates to different calibration points was also not investigated exhaustively. Here we address these limitations by using a systematic tree comparison metric to calculate the similarity between the Bayesian phylogenetic trees and the subgroups proposed by historical linguistics, and by re-estimating the age of the Austronesian expansion using only the most robust calibrations. The results show that the Austronesian language phylogenies are highly congruent with the traditional subgroupings, and the date estimates are robust even when calculated using a restricted set of historical calibrations.
PLoS ONE 5(11):e13718, 2010
Background Early stone tools provide direct evidence of human cognitive and behavioral evolution that is otherwise unavailable. Proper interpretation of these data requires a robust interpretive framework linking archaeological evidence to specific behavioral and ...
2009
PLoS ONE 4(11):e7678, 2009
Zipf's discovery that word frequency distributions obey a power law established parallels between biological and physical processes, and language, laying the groundwork for a complex systems perspective on human communication. More recent research has also identified scaling ...MORE ⇓
Zipf's discovery that word frequency distributions obey a power law established parallels between biological and physical processes, and language, laying the groundwork for a complex systems perspective on human communication. More recent research has also identified scaling regularities in the dynamics underlying the successive occurrences of events, suggesting the possibility of similar findings for language as well.

By considering frequent words in USENET discussion groups and in disparate databases where the language has different levels of formality, here we show that the distributions of distances between successive occurrences of the same word display bursty deviations from a Poisson process and are well characterized by a stretched exponential (Weibull) scaling. The extent of this deviation depends strongly on semantic type - a measure of the logicality of each word - and less strongly on frequency. We develop a generative model of this behavior that fully determines the dynamics of word usage.

Recurrence patterns of words are well described by a stretched exponential distribution of recurrence times, an empirical scaling that cannot be anticipated from Zipf's law. Because the use of words provides a uniquely precise and powerful lens on human thought and activity, our findings also have implications for other overt manifestations of collective human dynamics.

2008
PLoS ONE 3(3):e1768, 2008
Vocal learning is a critical behavioral substrate for spoken human language. It is a rare trait found in three distantly related groups of birds-songbirds, hummingbirds, and parrots. These avian groups have remarkably similar systems of cerebral vocal nuclei for the control of ...MORE ⇓
Vocal learning is a critical behavioral substrate for spoken human language. It is a rare trait found in three distantly related groups of birds-songbirds, hummingbirds, and parrots. These avian groups have remarkably similar systems of cerebral vocal nuclei for the control of ...