Language Evolution and Computation Bibliography

Our site (www.isrl.uiuc.edu/amag/langev) retired, please use https://langev.com instead.
Niloy Ganguly
2010
Modeling the Redundancy of Human Speech Sound Inventories: An Information Theoretic Approach
Journal of Quantitative Linguistics, 2010
In traditional generative linguistics sounds of a language are represented as bundle of binary valued features. The sounds used in a language are not randomly chosen from a universal repository of phonemes, but are known to be correlated in terms of the features they use. ...MORE ⇓
In traditional generative linguistics sounds of a language are represented as bundle of binary valued features. The sounds used in a language are not randomly chosen from a universal repository of phonemes, but are known to be correlated in terms of the features they use. Discovery of these correlation patterns and organizational principles behind the structure of sound inventories has been one of the classic problems in phonology. In this work, we show that the amount of redundancy present in the sound inventory of a language, which is an information theoretic measure reflecting the ratio of the number of distinctive features used in the language to that of the minimum number of features required to distinguish between the sounds present in the language, lies within a very narrow range irrespective of the factors such as the size of the inventory, the language family and the typology. This is a hitherto unreported significant observation that points to a universal structural property of the sound inventories of human languages. This property might be an outcome of self-organization of the sound inventories through the processes of language acquisition and change, or of the way in which phonemes are represented in generative phonology.
2009
Language Diversity across the Consonant Inventories: A Study in the Framework of Complex NetworksPDF
EACL 2009 Workshop on Cognitive Aspects of Computational Language Acquisition, 2009
In this paper, we attempt to explain the emergence of the linguistic diversity that exists across the consonant inventories of some of the major language families of the world through a complex network based growth model. There is only a single parameter for this model that is ...MORE ⇓
In this paper, we attempt to explain the emergence of the linguistic diversity that exists across the consonant inventories of some of the major language families of the world through a complex network based growth model. There is only a single parameter for this model that is meant to introduce a small amount of randomness in the otherwise preferential attachment based growth process. The experiments with this model parameter indicates that the choice of consonants among the languages within a family are far more preferential than it is across the families. The implications of this result are twofold -- (a) there is an innate preference of the speakers towards acquiring certain linguistic structures over others and (b) shared ancestry propels the stronger preferential connection between the languages within a family than across them. Furthermore, our observations indicate that this parameter might bear a correlation with the period of existence of the language families under investigation.
Journal of Quantitative Linguistics 16(2):157-184, 2009
The sound inventories of the world's languages self-organize themselves giving rise to similar cross-linguistic patterns. In this work we attempt to capture this phenomenon of self-organization, which shapes the structure of the consonant inventories, through a complex network ...MORE ⇓
The sound inventories of the world's languages self-organize themselves giving rise to similar cross-linguistic patterns. In this work we attempt to capture this phenomenon of self-organization, which shapes the structure of the consonant inventories, through a complex network approach. For this purpose we define the occurrence and co-occurrence networks of consonants and systematically study some of their important topological properties. A crucial observation is that the occurrence as well as the co-occurrence of consonants across languages follow a power law distribution. This property is arguably a consequence of the principle of preferential attachment. In order to support this argument we propose a synthesis model which reproduces the degree distribution for the networks to a close approximation. We further observe that the co-occurrence network of consonants show a high degree of clustering and subsequently refine our synthesis model in order to incorporate this property. Finally, we discuss how preferential attachment manifests itself through the evolutionary nature of language.
2008
Modeling the Structure and Dynamics of the Consonant Inventories: A Complex Network ApproachPDF
Proceedings of COLING-08, 2008
We study the self-organization of the consonant inventories through a complex network approach. We observe that the distribution of occurrence as well as cooccurrence of the consonants across languages follow a power-law behavior. The co-occurrence network of consonants exhibits ...MORE ⇓
We study the self-organization of the consonant inventories through a complex network approach. We observe that the distribution of occurrence as well as cooccurrence of the consonants across languages follow a power-law behavior. The co-occurrence network of consonants exhibits a high clustering coefficient. We propose four novel synthesis models for these networks (each of which is a refinement of the earlier) so as to successively match with higher accuracy (a) the above mentioned topological properties as well as (b) the linguistic property of feature economy exhibited by the consonant inventories. We conclude by arguing that a possible interpretation of this mechanism of network growth is the process of child language acquisition. Such models essentially increase our understanding of the structure of languages that is influenced by their evolutionary dynamics and this, in turn, can be extremely useful for building future NLP applications.
Advances in Complex Systems 11(3):371-392, 2008
In this work, we attempt to capture patterns of co-occurrence across vowel systems and at the same time figure out the nature of the force leading to the emergence of such patterns. For this purpose we define a weighted network where the vowels are the nodes and an edge between ...MORE ⇓
In this work, we attempt to capture patterns of co-occurrence across vowel systems and at the same time figure out the nature of the force leading to the emergence of such patterns. For this purpose we define a weighted network where the vowels are the nodes and an edge between two nodes (read vowels) signify their co-occurrence likelihood over the vowel inventories. Through this network we identify communities of vowels, which essentially reflect their patterns of co-occurrence across languages. We observe that in the assortative vowel communities the constituent nodes (read vowels) are largely uncorrelated in terms of their features and show that they are formed based on the principle of maximal perceptual contrast. However, in the rest of the communities, strong correlations are reflected among the constituent vowels with respect to their features indicating that it is the principle of feature economy that binds them together. We validate the above observations by proposing a quantitative measure of perceptual contrast as well as feature economy and subsequently comparing the results obtained due to these quantifications with those where we assume that the vowel inventories had evolved just by chance.
2007
International Journal of Modern Physics C 18(2):281-295, 2007
Speech sounds of the languages all over the world show remarkable patterns of cooccurrence. In this work, we attempt to automatically capture the patterns of cooccurrence of the consonants across languages and at the same time figure out the nature of the force leading to the ...MORE ⇓
Speech sounds of the languages all over the world show remarkable patterns of cooccurrence. In this work, we attempt to automatically capture the patterns of cooccurrence of the consonants across languages and at the same time figure out the nature of the force leading to the emergence of such patterns. For this purpose we define a weighted network where the consonants are the nodes and an edge between two nodes (read consonants) signify their co-occurrence likelihood over the consonant inventories. Through this network we identify communities of consonants that essentially reflect their patterns of co-occurrence across languages. We test the goodness of the communities and observe that the constituent consonants frequently occur in such groups in real languages also. Interestingly, the consonants forming these communities reflect strong correlations in terms of their features, which indicate that the principle of feature economy acts as a driving force towards community formation. In order to measure the strength of this force we propose an information theoretic definition of feature economy and show that indeed the feature economy exhibited by the consonant communities are substantially better than those if the consonant inventories had evolved just by chance.
Redundancy ratio: an invariant property of the consonant inventories of the world's languagesPDF
Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, 2007
In this paper, we put forward an information theoretic definition of the redundancy that is observed across the sound inventories of the world's languages. Through rigorous statistical analysis, we find that this redundancy is an invariant property of the consonant inventories. ...MORE ⇓
In this paper, we put forward an information theoretic definition of the redundancy that is observed across the sound inventories of the world's languages. Through rigorous statistical analysis, we find that this redundancy is an invariant property of the consonant inventories. The statistical analysis further unfolds that the vowel inventories do not exhibit any such property, which in turn points to the fact that the organizing principles of the vowel and the consonant inventories are quite different in nature.
Emergence of community structures in vowel inventories: an analysis based on complex networksPDF
Proceedings of Ninth Meeting of the ACL Special Interest Group in Computational Morphology and Phonology, 2007
In this work, we attempt to capture patterns of co-occurrence across vowel systems and at the same time figure out the nature of the force leading to the emergence of such patterns. For this purpose we define a weighted network where the vowels are the nodes and an edge between ...MORE ⇓
In this work, we attempt to capture patterns of co-occurrence across vowel systems and at the same time figure out the nature of the force leading to the emergence of such patterns. For this purpose we define a weighted network where the vowels are the nodes and an edge between two nodes (read vowls) signify their co-occurrence likelihood over the vowel inventories. Through this network we identify communities of vowels, which essentially reflect their patterns of co-occurrence across languages. We observe that in the assortative vowel communities the constituent nodes (read vowels) are largely uncorrelated in terms of their features indicating that they are formed based on the principle of maximal perceptual contrast. However, in the rest of the communities, strong correlations are reflected among the constituent vowels with respect to their features indicating that it is the principle of feature economy that binds them together.
2006
Analysis and Synthesis of the Distribution of Consonants over Languages: A Complex Network ApproachPDF
COLING-ACL06, 2006
Cross-linguistic similarities are reflected by the speech sound systems of languages all over the world. In this work we try to model such similarities observed in the consonant inventories, through a complex bipartite network. We present a systematic study of some of the ...MORE ⇓
Cross-linguistic similarities are reflected by the speech sound systems of languages all over the world. In this work we try to model such similarities observed in the consonant inventories, through a complex bipartite network. We present a systematic study of some of the appealing features of these inventories with the help of the bipartite network. An important observation is that the occurrence of consonants follows a two regime power law distribution. We find that the consonant inventory size distribution together with the principle of preferential attachment are the main reasons behind the emergence of such a two regime behavior. In order to further support our explanation we present a synthesis model for this network based on the general theory of preferential attachment.