Language Evolution and Computation Bibliography

Our site (www.isrl.uiuc.edu/amag/langev) retired, please use https://langev.com instead.
Animesh Mukherjee
2013
Emergence of fast agreement in an overhearing population: The case of the naming gamePDF
Europhysics Letters 101(6):68004, 2013
The naming game (NG) describes the agreement dynamics of a population of N agents interacting locally in pairs leading to the emergence of a shared vocabulary. This model has its relevance in the novel fields of semiotic dynamics and specifically to opinion formation and language ...MORE ⇓
The naming game (NG) describes the agreement dynamics of a population of N agents interacting locally in pairs leading to the emergence of a shared vocabulary. This model has its relevance in the novel fields of semiotic dynamics and specifically to opinion formation and language evolution. The application of this model ranges from wireless sensor networks as spreading algorithms, leader election algorithms to user-based social tagging systems. In this paper, we introduce the concept of overhearing (i.e., at every time step of the game, a random set of Nδ individuals are chosen from the population who overhear the transmitted word from the speaker and accordingly reshape their inventories). When δ = 0 one recovers the behavior of the original NG. As one increases δ, the population of agents reaches a faster agreement with a significantly low-memory requirement. The convergence time to reach global consensus scales as logN as δ approaches 1.
2012
Opinion formation in time-varying social networks: The case of the naming gamePDF
Physical Review E 86(3):036110, 2012
Social networks are inherently dynamic. Social interactions and human activities are intermittent, the neighborhood of individuals moving over a geographic space evolves over time, links appear and disappear in the World Wide Web. The essence of social network ...
PNAS 109(18):6819--6824, 2012
Abstract One of the fundamental problems in cognitive science is how humans categorize the visible color spectrum. The empirical evidence of the existence of universal or recurrent patterns in color naming across cultures is paralleled by the observation that color names ...
Advances in Complex Systems 15(03n04):1150016, 2012
It is widely known that color names across the world's languages tend to be organized into a neat hierarchy with a small set of 'basic names' featuring in a comparatively fixed order across linguistic societies. However, to date, the basic names have only been defined through a ...MORE ⇓
It is widely known that color names across the world's languages tend to be organized into a neat hierarchy with a small set of 'basic names' featuring in a comparatively fixed order across linguistic societies. However, to date, the basic names have only been defined through a set of linguistic principles. There is no statistical definition that quantitatively separates the basic names from the rest of the color words across languages. Here we present a rigorous statistical analysis of the World Color Survey database hosting color word information from 110 non-industrialized languages. The central result is that those names for which a population of individuals show a larger overall agreement across languages turn out to be the basic ones exactly reproducing the color name hierarchy and, thereby, providing, for the first time, an empirical definition of the basic color names.
Privacy, Security, Risk and Trust (PASSAT), pages 508--513, 2012
We study the dynamics of the Naming Game as an opinion formation model on social networks. This agent-based model captures the essential features of the agreement dynamics by means of a memory-based negotiation process. Our study focuses on the impact of dominance of certain ...MORE ⇓
We study the dynamics of the Naming Game as an opinion formation model on social networks. This agent-based model captures the essential features of the agreement dynamics by means of a memory-based negotiation process. Our study focuses on the impact of dominance of certain opinions over others in pursuit of faster agreement on social networks. We propose two models to incorporate dominance of the opinions. We observe that both these models lead to faster agreement among the agents on an opinion as compared to the base case reported in the literature. We perform extensive simulations on computer-generated networks as well as on a real online social network (Facebook) and in both cases the dominance based models converge significantly faster than the base case.
2011
Journal of Statistical Mechanics: Theory and Experiment, 2011
Language dynamics is a rapidly growing field that focuses on all processes related to the emergence, evolution, change and extinction of languages. Recently, the study of self-organization and evolution of language and meaning has led to the idea that a community of language ...MORE ⇓
Language dynamics is a rapidly growing field that focuses on all processes related to the emergence, evolution, change and extinction of languages. Recently, the study of self-organization and evolution of language and meaning has led to the idea that a community of language users can be seen as a complex dynamical system, which collectively solves the problem of developing a shared communication framework through the back-and-forth signaling between individuals.

We shall review some of the progress made in the past few years and highlight potential future directions of research in this area. In particular, the emergence of a common lexicon and of a shared set of linguistic categories will be discussed, as examples corresponding to the early stages of a language. The extent to which synthetic modeling is nowadays contributing to the ongoing debate in cognitive science will be pointed out. In addition, the burst of growth of the web is providing new experimental frameworks. It makes available a huge amount of resources, both as novel tools and data to be analyzed, allowing quantitative and large-scale analysis of the processes underlying the emergence of a collective information and language dynamics.

PLoS ONE 6(2):e16677, 2011
Human languages evolve continuously, and a puzzling problem is how to reconcile the apparent robustness of most of the deep linguistic structures we use with the evidence that they undergo possibly slow, yet ceaseless, changes. Is the state in which we observe languages today ...MORE ⇓
Human languages evolve continuously, and a puzzling problem is how to reconcile the apparent robustness of most of the deep linguistic structures we use with the evidence that they undergo possibly slow, yet ceaseless, changes. Is the state in which we observe languages today closer to what would be a dynamical attractor with statistically stationary properties or rather closer to a non-steady state slowly evolving in time? Here we address this question in the framework of the emergence of shared linguistic categories in a population of individuals interacting through language games. The observed emerging asymptotic categorization, which has been previously tested - with success - against experimental data from human languages, corresponds to a metastable state where global shifts are always possible but progressively more unlikely and the response properties depend on the age of the system. This aging mechanism exhibits striking quantitative analogies to what is observed in the statistical mechanics of glassy systems. We argue that this can be a general scenario in language dynamics where shared linguistic conventions would not emerge as attractors, but rather as metastable states.
Journal of Computational Science 2(4):316--323, 2011
Article history: Received 21 December 2010 Received in revised form 20 September 2011 Accepted 3 October 2011 Available online xxx Keywords: Category game Metastable states No-rejection algorithms Agent-based simulation abstract
2010
Global topology of word co-occurrence networks: Beyond the two-regime power-lawPDF
proceedings of COLING10, 2010
Word co-occurrence networks are one of the most common linguistic networks studied in the past and they are known to exhibit several interesting topological characteristics. In this article, we investigate the global topological properties of word co-occurrence networks and, in ...MORE ⇓
Word co-occurrence networks are one of the most common linguistic networks studied in the past and they are known to exhibit several interesting topological characteristics. In this article, we investigate the global topological properties of word co-occurrence networks and, in particular, present a detailed study of their spectrum. Our experiments reveal certain universal trends found across the networks for seven different languages from three different language families, which are neither reported nor explained by any of the previous studies and models of word-cooccurrence networks. We hypothesize that since word co-occurrences are governed by syntactic properties of a language, the network has much constrained topology than that predicted by the previously proposed growth model. A deeper empirical and theoretical investigation into the evolution of these networks further suggests that they have a core-periphery structure, where the core hardly evolves with time and new words are only attached to the periphery of the network. These properties are fundamental to the nature of word co-occurrence across languages.
Modeling the Redundancy of Human Speech Sound Inventories: An Information Theoretic Approach
Journal of Quantitative Linguistics, 2010
In traditional generative linguistics sounds of a language are represented as bundle of binary valued features. The sounds used in a language are not randomly chosen from a universal repository of phonemes, but are known to be correlated in terms of the features they use. ...MORE ⇓
In traditional generative linguistics sounds of a language are represented as bundle of binary valued features. The sounds used in a language are not randomly chosen from a universal repository of phonemes, but are known to be correlated in terms of the features they use. Discovery of these correlation patterns and organizational principles behind the structure of sound inventories has been one of the classic problems in phonology. In this work, we show that the amount of redundancy present in the sound inventory of a language, which is an information theoretic measure reflecting the ratio of the number of distinctive features used in the language to that of the minimum number of features required to distinguish between the sounds present in the language, lies within a very narrow range irrespective of the factors such as the size of the inventory, the language family and the typology. This is a hitherto unreported significant observation that points to a universal structural property of the sound inventories of human languages. This property might be an outcome of self-organization of the sound inventories through the processes of language acquisition and change, or of the way in which phonemes are represented in generative phonology.
2009
Language Diversity across the Consonant Inventories: A Study in the Framework of Complex NetworksPDF
EACL 2009 Workshop on Cognitive Aspects of Computational Language Acquisition, 2009
In this paper, we attempt to explain the emergence of the linguistic diversity that exists across the consonant inventories of some of the major language families of the world through a complex network based growth model. There is only a single parameter for this model that is ...MORE ⇓
In this paper, we attempt to explain the emergence of the linguistic diversity that exists across the consonant inventories of some of the major language families of the world through a complex network based growth model. There is only a single parameter for this model that is meant to introduce a small amount of randomness in the otherwise preferential attachment based growth process. The experiments with this model parameter indicates that the choice of consonants among the languages within a family are far more preferential than it is across the families. The implications of this result are twofold -- (a) there is an innate preference of the speakers towards acquiring certain linguistic structures over others and (b) shared ancestry propels the stronger preferential connection between the languages within a family than across them. Furthermore, our observations indicate that this parameter might bear a correlation with the period of existence of the language families under investigation.
The Structure and Dynamics of Linguistic NetworksPDF
Dynamics on and of Complex Networks: Applications to Biology, Computer Science, Economics, and the Social Sciences, pages 145--166, 2009
This survey is meant to explore the structure and dynamics of natural languages in the framework of complex networks. We begin with a description of lexical networks, where the nodes are words and edges represent lexical relationship between two words such as phonetic and ...MORE ⇓
This survey is meant to explore the structure and dynamics of natural languages in the framework of complex networks. We begin with a description of lexical networks, where the nodes are words and edges represent lexical relationship between two words such as phonetic and semantic similarity. This is followed by an overview of various networks where again the nodes are the words, but unlike the case of lexical networks, the edges represent their co-occurrences in similar context. These networks are representations of the interactions among words as governed by the grammar rules of a language. Next we discuss the properties of phonological networks, where the nodes are sub-lexical units such as phonemes or syllables. Applications of linguistic networks in Natural Language Processing (NLP) and Information Retrieval (IR) are also discussed. We conclude the survey by enumerating some open problems in the area of linguistic networks.
Journal of Quantitative Linguistics 16(2):157-184, 2009
The sound inventories of the world's languages self-organize themselves giving rise to similar cross-linguistic patterns. In this work we attempt to capture this phenomenon of self-organization, which shapes the structure of the consonant inventories, through a complex network ...MORE ⇓
The sound inventories of the world's languages self-organize themselves giving rise to similar cross-linguistic patterns. In this work we attempt to capture this phenomenon of self-organization, which shapes the structure of the consonant inventories, through a complex network approach. For this purpose we define the occurrence and co-occurrence networks of consonants and systematically study some of their important topological properties. A crucial observation is that the occurrence as well as the co-occurrence of consonants across languages follow a power law distribution. This property is arguably a consequence of the principle of preferential attachment. In order to support this argument we propose a synthesis model which reproduces the degree distribution for the networks to a close approximation. We further observe that the co-occurrence network of consonants show a high degree of clustering and subsequently refine our synthesis model in order to incorporate this property. Finally, we discuss how preferential attachment manifests itself through the evolutionary nature of language.
Discovering Global Patterns in Linguistic Networks through Spectral Analysis: A Case Study of the Consonant InventoriesPDF
EACL 2009, pages 585-593, 2009
Recent research has shown that language and the socio-cognitive phenomena associated with it can be aptly modeled and visualized through networks of linguistic entities. However, most of the existing works on linguistic networks focus only on the local properties of the networks. ...MORE ⇓
Recent research has shown that language and the socio-cognitive phenomena associated with it can be aptly modeled and visualized through networks of linguistic entities. However, most of the existing works on linguistic networks focus only on the local properties of the networks. This study is an attempt to analyze the structure of languages via a purely structural technique, namely spectral analysis, which is ideally suited for discovering the global correlations in a network. Application of this technique to PhoNet, the co-occurrence network of consonants, not only reveals several natural linguistic principles governing the structure of the consonant inventories, but is also able to quantify their relative importance. We believe that this powerful technique can be successfully applied, in general, to study the structure of natural languages.
Self-Organization of Speech Sound Inventories in the Framework of Complex NetworksPDF
Indian Institute of Technology Kharagpur, 2009
The sound inventories of the world's languages show a considerable extent of symmetry. It has been postulated that this symmetry is a reflection of the human physiological, cognitive and societal factors. There have been a large number of linguistically motivated studies in order ...MORE ⇓
The sound inventories of the world's languages show a considerable extent of symmetry. It has been postulated that this symmetry is a reflection of the human physiological, cognitive and societal factors. There have been a large number of linguistically motivated studies in order to explain the self-organization of these inventories that arguably leads to the emergence of this symmetry. A few computational models in order to explain especially the structure of the smaller vowel inventories have also been proposed in the literature. However, there is a need for a single unified computational framework for studying the self-organization of the vowel as well as other inventories of complex utterances like consonants and syllables.

In this thesis, we reformulate this problem in the light of statistical mechanics and present complex network representations of these inventories. The central objective of the thesis is to study and explain the self-organization and emergence of the consonant inventories. Nevertheless, in order to demonstrate the versatility of our modeling methodology, we further apply it to investigate and detect certain interesting properties of the vowel inventories.

Two types of networks are considered - a language-consonant bipartite network and a consonant-consonant co-occurrence network. The networks are constructed from the UCLA Phonological Segment Inventory Database (UPSID). From the systematic analysis of these networks we find that the occurrence and co-occurrence of the consonants over languages follow a well-behaved probability distribution. The co-occurrence network also exhibits a high clustering coefficient. We propose different synthetic models of network growth based on preferential attachment so as to successively match with higher accuracy the different statistical properties of the networks. Furthermore, in order to have a deeper understanding of the growth dynamics we analytically solve the models to derive expressions for the emergent degree distribution and clustering coefficient. The co-occurrence network also exhibits strong community structures and a careful inspection indicates that the driving force behind the community formation is grounded in the human articulatory and perceptual factors. In order to quantitatively validate the above principle, we introduce an information theoretic definition of this factor feature entropy and show that the natural language inventories are significantly different in terms of this quantity from the randomly generated ones. We further construct similar networks for the vowel inventories and study various interesting similarities as well as differences between them and the consonant inventories.

To summarize, this thesis shows that complex networks can be suitably used to study the self-organization of the human speech sound inventories. In this light, we deem this computational framework as a highly powerful tool in future for modeling and explaining the emergence of many other complex linguistic phenomena.

2008
Modeling the Structure and Dynamics of the Consonant Inventories: A Complex Network ApproachPDF
Proceedings of COLING-08, 2008
We study the self-organization of the consonant inventories through a complex network approach. We observe that the distribution of occurrence as well as cooccurrence of the consonants across languages follow a power-law behavior. The co-occurrence network of consonants exhibits ...MORE ⇓
We study the self-organization of the consonant inventories through a complex network approach. We observe that the distribution of occurrence as well as cooccurrence of the consonants across languages follow a power-law behavior. The co-occurrence network of consonants exhibits a high clustering coefficient. We propose four novel synthesis models for these networks (each of which is a refinement of the earlier) so as to successively match with higher accuracy (a) the above mentioned topological properties as well as (b) the linguistic property of feature economy exhibited by the consonant inventories. We conclude by arguing that a possible interpretation of this mechanism of network growth is the process of child language acquisition. Such models essentially increase our understanding of the structure of languages that is influenced by their evolutionary dynamics and this, in turn, can be extremely useful for building future NLP applications.
Advances in Complex Systems 11(3):371-392, 2008
In this work, we attempt to capture patterns of co-occurrence across vowel systems and at the same time figure out the nature of the force leading to the emergence of such patterns. For this purpose we define a weighted network where the vowels are the nodes and an edge between ...MORE ⇓
In this work, we attempt to capture patterns of co-occurrence across vowel systems and at the same time figure out the nature of the force leading to the emergence of such patterns. For this purpose we define a weighted network where the vowels are the nodes and an edge between two nodes (read vowels) signify their co-occurrence likelihood over the vowel inventories. Through this network we identify communities of vowels, which essentially reflect their patterns of co-occurrence across languages. We observe that in the assortative vowel communities the constituent nodes (read vowels) are largely uncorrelated in terms of their features and show that they are formed based on the principle of maximal perceptual contrast. However, in the rest of the communities, strong correlations are reflected among the constituent vowels with respect to their features indicating that it is the principle of feature economy that binds them together. We validate the above observations by proposing a quantitative measure of perceptual contrast as well as feature economy and subsequently comparing the results obtained due to these quantifications with those where we assume that the vowel inventories had evolved just by chance.
2007
International Journal of Modern Physics C 18(2):281-295, 2007
Speech sounds of the languages all over the world show remarkable patterns of cooccurrence. In this work, we attempt to automatically capture the patterns of cooccurrence of the consonants across languages and at the same time figure out the nature of the force leading to the ...MORE ⇓
Speech sounds of the languages all over the world show remarkable patterns of cooccurrence. In this work, we attempt to automatically capture the patterns of cooccurrence of the consonants across languages and at the same time figure out the nature of the force leading to the emergence of such patterns. For this purpose we define a weighted network where the consonants are the nodes and an edge between two nodes (read consonants) signify their co-occurrence likelihood over the consonant inventories. Through this network we identify communities of consonants that essentially reflect their patterns of co-occurrence across languages. We test the goodness of the communities and observe that the constituent consonants frequently occur in such groups in real languages also. Interestingly, the consonants forming these communities reflect strong correlations in terms of their features, which indicate that the principle of feature economy acts as a driving force towards community formation. In order to measure the strength of this force we propose an information theoretic definition of feature economy and show that indeed the feature economy exhibited by the consonant communities are substantially better than those if the consonant inventories had evolved just by chance.
Redundancy ratio: an invariant property of the consonant inventories of the world's languagesPDF
Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, 2007
In this paper, we put forward an information theoretic definition of the redundancy that is observed across the sound inventories of the world's languages. Through rigorous statistical analysis, we find that this redundancy is an invariant property of the consonant inventories. ...MORE ⇓
In this paper, we put forward an information theoretic definition of the redundancy that is observed across the sound inventories of the world's languages. Through rigorous statistical analysis, we find that this redundancy is an invariant property of the consonant inventories. The statistical analysis further unfolds that the vowel inventories do not exhibit any such property, which in turn points to the fact that the organizing principles of the vowel and the consonant inventories are quite different in nature.
Emergence of community structures in vowel inventories: an analysis based on complex networksPDF
Proceedings of Ninth Meeting of the ACL Special Interest Group in Computational Morphology and Phonology, 2007
In this work, we attempt to capture patterns of co-occurrence across vowel systems and at the same time figure out the nature of the force leading to the emergence of such patterns. For this purpose we define a weighted network where the vowels are the nodes and an edge between ...MORE ⇓
In this work, we attempt to capture patterns of co-occurrence across vowel systems and at the same time figure out the nature of the force leading to the emergence of such patterns. For this purpose we define a weighted network where the vowels are the nodes and an edge between two nodes (read vowls) signify their co-occurrence likelihood over the vowel inventories. Through this network we identify communities of vowels, which essentially reflect their patterns of co-occurrence across languages. We observe that in the assortative vowel communities the constituent nodes (read vowels) are largely uncorrelated in terms of their features indicating that they are formed based on the principle of maximal perceptual contrast. However, in the rest of the communities, strong correlations are reflected among the constituent vowels with respect to their features indicating that it is the principle of feature economy that binds them together.
2006
Analysis and Synthesis of the Distribution of Consonants over Languages: A Complex Network ApproachPDF
COLING-ACL06, 2006
Cross-linguistic similarities are reflected by the speech sound systems of languages all over the world. In this work we try to model such similarities observed in the consonant inventories, through a complex bipartite network. We present a systematic study of some of the ...MORE ⇓
Cross-linguistic similarities are reflected by the speech sound systems of languages all over the world. In this work we try to model such similarities observed in the consonant inventories, through a complex bipartite network. We present a systematic study of some of the appealing features of these inventories with the help of the bipartite network. An important observation is that the occurrence of consonants follows a two regime power law distribution. We find that the consonant inventory size distribution together with the principle of preferential attachment are the main reasons behind the emergence of such a two regime behavior. In order to further support our explanation we present a synthesis model for this network based on the general theory of preferential attachment.