Language Evolution and Computation Bibliography

Our site (www.isrl.uiuc.edu/amag/langev) retired, please use https://langev.com instead.
Francesca Tria
2015
Journal of Memory and Language 84:205-223, 2015
Several recent theories have suggested that an increase in the number of non-native speakers in a language can lead to changes in morphological rules. We examine this experimentally by contrasting the performance of native and non-native English speakers in a simple Wug-task, ...MORE ⇓
Several recent theories have suggested that an increase in the number of non-native speakers in a language can lead to changes in morphological rules. We examine this experimentally by contrasting the performance of native and non-native English speakers in a simple Wug-task, showing that non-native speakers are significantly more likely to provide non -ed (i.e., irregular) past-tense forms for novel verbs than native speakers. Both groups are sensitive to sound similarities between new words and existing words (i.e., are more likely to provide irregular forms for novel words which sound similar to existing irregulars). Among both natives and non-natives, irregularizations are non-random; that is, rather than presenting as truly irregular inflectional strategies, they follow identifiable sub-rules present in the highly frequent set of irregular English verbs. Our results shed new light on how native and non-native learners can affect language structure.
2014
Physics of life reviews 11 2:311-2, 2014
The debate on language origin and evolution has benefited from a largely interdisciplinary effort, involving linguists, anthropologists, sociologist as well as physicists, mathematicians and computer scientists. A fundamental question is whether a shared communication system can ...MORE ⇓
The debate on language origin and evolution has benefited from a largely interdisciplinary effort, involving linguists, anthropologists, sociologist as well as physicists, mathematicians and computer scientists. A fundamental question is whether a shared communication system can emerge from repeated interactions among individuals, not relying on any a priori or innate language-specific structure. Modeling, and in particular language games, proved to be a powerful tool to gain insight on this beautiful mystery. In particular, fruitful investigations has been done concerning the possibility for a population of individuals to exploit local communication acts to build up a shared vocabulary [1] or a system of linguistic categories reproducing the universality and the hierarchies observed in anthropological data [2–4]. A particular effort has been also devoted to the origin of the complex organization of syntax in hierarchical structures, one of the core design features of human language. As Gong and coauthors highlighted in this review [5], a combinatorial and compositional structure can emerge out of a holistic language due to communication purposes, and explaining how this could possibly happen still represents an intriguing challenge [6–11]. It is important to remark how theoretical investigations should be, and are more and more, paralleled by a growing attention to a careful comparison with data on language formation. Different kind of data can be exploited to shed light on different questions. Diachronic and historical data related to migration patterns have been used for instance to study the timescales of language evolution. Anthropological studies on pre-industrialized populations [12,13] have been crucial in the understanding language universals. At the same times, experiments in cognitive science helped in shading light on the mechanisms emerging when individuals are called to perform communicative tasks [14]. It is worth mentioning in this perspective how advances in information and communication technologies allow nowadays the realization of focused experiments also in the framework of the emergence of linguistic structures exploiting the huge basin of web users. In particular, a general trend is emerging for the adoption of web-games as a very interesting laboratory to run experiments in the social-sciences and whenever the contribution of human beings is crucially required for research purposes. This is opening tremendous opportunities to monitor the emergence of specific linguistic features and their co-evolution with the structure of our conceptual spaces.
PloS one 9:839-862, 2014
Human languages are rule governed, but almost invariably these rules have exceptions in the form of irregularities. Since rules in language are efficient and productive, the persistence of irregularity is an anomaly. How does irregularity linger in the face of internal ...MORE ⇓
Human languages are rule governed, but almost invariably these rules have exceptions in the form of irregularities. Since rules in language are efficient and productive, the persistence of irregularity is an anomaly. How does irregularity linger in the face of internal (endogenous) and external (exogenous) pressures to conform to a rule? Here we address this problem by taking a detailed look at simple past tense verbs in the Corpus of Historical American English. The data show that the language is open, with many new verbs entering. At the same time, existing verbs might tend to regularize or irregularize as a consequence of internal dynamics, but overall, the amount of irregularity sustained by the language stays roughly constant over time. Despite continuous vocabulary growth, and presumably, an attendant increase in expressive power, there is no corresponding growth in irregularity. We analyze the set of irregulars, showing they may adhere to a set of minority rules, allowing for increased stability of irregularity over time. These findings contribute to the debate on how language systems become rule governed, and how and why they sustain exceptions to rules, providing insight into the interplay between the emergence and maintenance of rules and exceptions in language.
2013
Emergence of fast agreement in an overhearing population: The case of the naming gamePDF
Europhysics Letters 101(6):68004, 2013
The naming game (NG) describes the agreement dynamics of a population of N agents interacting locally in pairs leading to the emergence of a shared vocabulary. This model has its relevance in the novel fields of semiotic dynamics and specifically to opinion formation and language ...MORE ⇓
The naming game (NG) describes the agreement dynamics of a population of N agents interacting locally in pairs leading to the emergence of a shared vocabulary. This model has its relevance in the novel fields of semiotic dynamics and specifically to opinion formation and language evolution. The application of this model ranges from wireless sensor networks as spreading algorithms, leader election algorithms to user-based social tagging systems. In this paper, we introduce the concept of overhearing (i.e., at every time step of the game, a random set of Nδ individuals are chosen from the population who overhear the transmitted word from the speaker and accordingly reshape their inventories). When δ = 0 one recovers the behavior of the original NG. As one increases δ, the population of agents reaches a faster agreement with a significantly low-memory requirement. The convergence time to reach global consensus scales as logN as δ approaches 1.
2012
Advances in Complex Systems 15(03n04):1203002, 2012
Thirty authors of different disciplines, ranging from cognitive science and linguistics to mathematics and physics, address the topic of language origin and evolution. Language dynamics is investigated through an interdisciplinary effort, involving field and synthetic ...MORE ⇓
Thirty authors of different disciplines, ranging from cognitive science and linguistics to mathematics and physics, address the topic of language origin and evolution. Language dynamics is investigated through an interdisciplinary effort, involving field and synthetic experiments, modelling and comparison of the theoretical predictions with empirical data. The result consists in new insights that significantly contribute to the ongoing debate on the origin and the evolution of language. In this Topical Issue the state of the art of this novel and fertile approach is reported by major experts of the field.
PNAS 109(18):6819--6824, 2012
Abstract One of the fundamental problems in cognitive science is how humans categorize the visible color spectrum. The empirical evidence of the existence of universal or recurrent patterns in color naming across cultures is paralleled by the observation that color names ...
Advances in Complex Systems 15(03n04):1150016, 2012
It is widely known that color names across the world's languages tend to be organized into a neat hierarchy with a small set of 'basic names' featuring in a comparatively fixed order across linguistic societies. However, to date, the basic names have only been defined through a ...MORE ⇓
It is widely known that color names across the world's languages tend to be organized into a neat hierarchy with a small set of 'basic names' featuring in a comparatively fixed order across linguistic societies. However, to date, the basic names have only been defined through a set of linguistic principles. There is no statistical definition that quantitatively separates the basic names from the rest of the color words across languages. Here we present a rigorous statistical analysis of the World Color Survey database hosting color word information from 110 non-industrialized languages. The central result is that those names for which a population of individuals show a larger overall agreement across languages turn out to be the basic ones exactly reproducing the color name hierarchy and, thereby, providing, for the first time, an empirical definition of the basic color names.
Naming a Structured World: A Cultural Route to Duality of PatterningPDF
PLoS ONE 7(6):e37744, 2012
The lexicons of human languages organize their units at two distinct levels. At a first combinatorial level, meaningless forms (typically referred to as phonemes) are combined into meaningful units (typically referred to as morphemes). Thanks to this, many ...
2011
Journal of Statistical Mechanics: Theory and Experiment, 2011
Language dynamics is a rapidly growing field that focuses on all processes related to the emergence, evolution, change and extinction of languages. Recently, the study of self-organization and evolution of language and meaning has led to the idea that a community of language ...MORE ⇓
Language dynamics is a rapidly growing field that focuses on all processes related to the emergence, evolution, change and extinction of languages. Recently, the study of self-organization and evolution of language and meaning has led to the idea that a community of language users can be seen as a complex dynamical system, which collectively solves the problem of developing a shared communication framework through the back-and-forth signaling between individuals.

We shall review some of the progress made in the past few years and highlight potential future directions of research in this area. In particular, the emergence of a common lexicon and of a shared set of linguistic categories will be discussed, as examples corresponding to the early stages of a language. The extent to which synthetic modeling is nowadays contributing to the ongoing debate in cognitive science will be pointed out. In addition, the burst of growth of the web is providing new experimental frameworks. It makes available a huge amount of resources, both as novel tools and data to be analyzed, allowing quantitative and large-scale analysis of the processes underlying the emergence of a collective information and language dynamics.

Physics of Life Reviews 8:371--372, 2011
Three ingredients play a central role in the study of origins and evolution of language and meaning: biological constraints, knowledge transmission between successive generations (vertical transmission) and achievement of a common knowledge within a single ...
PLoS ONE 6(2):e16677, 2011
Human languages evolve continuously, and a puzzling problem is how to reconcile the apparent robustness of most of the deep linguistic structures we use with the evidence that they undergo possibly slow, yet ceaseless, changes. Is the state in which we observe languages today ...MORE ⇓
Human languages evolve continuously, and a puzzling problem is how to reconcile the apparent robustness of most of the deep linguistic structures we use with the evidence that they undergo possibly slow, yet ceaseless, changes. Is the state in which we observe languages today closer to what would be a dynamical attractor with statistically stationary properties or rather closer to a non-steady state slowly evolving in time? Here we address this question in the framework of the emergence of shared linguistic categories in a population of individuals interacting through language games. The observed emerging asymptotic categorization, which has been previously tested - with success - against experimental data from human languages, corresponds to a metastable state where global shifts are always possible but progressively more unlikely and the response properties depend on the age of the system. This aging mechanism exhibits striking quantitative analogies to what is observed in the statistical mechanics of glassy systems. We argue that this can be a general scenario in language dynamics where shared linguistic conventions would not emerge as attractors, but rather as metastable states.
PLoS ONE 6(6):e20109, 2011
Historical linguistics aims at inferring the most likely language phylogenetic tree starting from information concerning the evolutionary relatedness of languages. The available information are typically lists of homologous (lexical, phonological, syntactic) features or ...MORE ⇓
Historical linguistics aims at inferring the most likely language phylogenetic tree starting from information concerning the evolutionary relatedness of languages. The available information are typically lists of homologous (lexical, phonological, syntactic) features or characters for many different languages: a set of parallel corpora whose compilation represents a paramount achievement in linguistics.

From this perspective the reconstruction of language trees is an example of inverse problems: starting from present, incomplete and often noisy, information, one aims at inferring the most likely past evolutionary history. A fundamental issue in inverse problems is the evaluation of the inference made. A standard way of dealing with this question is to generate data with artificial models in order to have full access to the evolutionary process one is going to infer. This procedure presents an intrinsic limitation: when dealing with real data sets, one typically does not know which model of evolution is the most suitable for them. A possible way out is to compare algorithmic inference with expert classifications. This is the point of view we take here by conducting a thorough survey of the accuracy of reconstruction methods as compared with the Ethnologue expert classifications. We focus in particular on state-of-the-art distance-based methods for phylogeny reconstruction using worldwide linguistic databases.

In order to assess the accuracy of the inferred trees we introduce and characterize two generalizations of standard definitions of distances between trees. Based on these scores we quantify the relative performances of the distance-based algorithms considered. Further we quantify how the completeness and the coverage of the available databases affect the accuracy of the reconstruction. Finally we draw some conclusions about where the accuracy of the reconstructions in historical linguistics stands and about the leading directions to improve it.

Journal of Computational Science 2(4):316--323, 2011
Article history: Received 21 December 2010 Received in revised form 20 September 2011 Accepted 3 October 2011 Available online xxx Keywords: Category game Metastable states No-rejection algorithms Agent-based simulation abstract