[langev] Monojit Choudhury

Global topology of word co-occurrence networks: Beyond the two-regime power-lawPDF

proceedings of COLING10, 2010

Word co-occurrence networks are one of the most common linguistic networks studied in the past and they are known to exhibit several interesting topological characteristics. In this article, we investigate the global topological properties of word co-occurrence networks and, in ...MORE ⇓

Word co-occurrence networks are one of the most common linguistic networks studied in the past and they are known to exhibit several interesting topological characteristics. In this article, we investigate the global topological properties of word co-occurrence networks and, in particular, present a detailed study of their spectrum. Our experiments reveal certain universal trends found across the networks for seven different languages from three different language families, which are neither reported nor explained by any of the previous studies and models of word-cooccurrence networks. We hypothesize that since word co-occurrences are governed by syntactic properties of a language, the network has much constrained topology than that predicted by the previously proposed growth model. A deeper empirical and theoretical investigation into the evolution of these networks further suggests that they have a core-periphery structure, where the core hardly evolves with time and new words are only attached to the periphery of the network. These properties are fundamental to the nature of word co-occurrence across languages.

Cited by 14 in Semantic Scholar | Search Google Scholar

Modeling the Redundancy of Human Speech Sound Inventories: An Information Theoretic Approach

A Mukherjee, M Choudhury, A Basu, N Ganguly

Journal of Quantitative Linguistics, 2010

In traditional generative linguistics sounds of a language are represented as bundle of binary valued features. The sounds used in a language are not randomly chosen from a universal repository of phonemes, but are known to be correlated in terms of the features they use. ...MORE ⇓

In traditional generative linguistics sounds of a language are represented as bundle of binary valued features. The sounds used in a language are not randomly chosen from a universal repository of phonemes, but are known to be correlated in terms of the features they use. Discovery of these correlation patterns and organizational principles behind the structure of sound inventories has been one of the classic problems in phonology. In this work, we show that the amount of redundancy present in the sound inventory of a language, which is an information theoretic measure reflecting the ratio of the number of distinctive features used in the language to that of the minimum number of features required to distinguish between the sounds present in the language, lies within a very narrow range irrespective of the factors such as the size of the inventory, the language family and the typology. This is a hitherto unreported significant observation that points to a universal structural property of the sound inventories of human languages. This property might be an outcome of self-organization of the sound inventories through the processes of language acquisition and change, or of the way in which phonemes are represented in generative phonology.

Search Google Scholar

Language Diversity across the Consonant Inventories: A Study in the Framework of Complex NetworksPDF

M Choudhury, A Mukherjee, A Basu, N Ganguly, A Garg, V Jalan

EACL 2009 Workshop on Cognitive Aspects of Computational Language Acquisition, 2009

In this paper, we attempt to explain the emergence of the linguistic diversity that exists across the consonant inventories of some of the major language families of the world through a complex network based growth model. There is only a single parameter for this model that is ...MORE ⇓

In this paper, we attempt to explain the emergence of the linguistic diversity that exists across the consonant inventories of some of the major language families of the world through a complex network based growth model. There is only a single parameter for this model that is meant to introduce a small amount of randomness in the otherwise preferential attachment based growth process. The experiments with this model parameter indicates that the choice of consonants among the languages within a family are far more preferential than it is across the families. The implications of this result are twofold -- (a) there is an innate preference of the speakers towards acquiring certain linguistic structures over others and (b) shared ancestry propels the stronger preferential connection between the languages within a family than across them. Furthermore, our observations indicate that this parameter might bear a correlation with the period of existence of the language families under investigation.

Cited by 1 in Semantic Scholar | Search Google Scholar

The Structure and Dynamics of Linguistic NetworksPDF

M Choudhury, A Mukherjee

Dynamics on and of Complex Networks: Applications to Biology, Computer Science, Economics, and the Social Sciences, pages 145--166, 2009

This survey is meant to explore the structure and dynamics of natural languages in the framework of complex networks. We begin with a description of lexical networks, where the nodes are words and edges represent lexical relationship between two words such as phonetic and ...MORE ⇓

This survey is meant to explore the structure and dynamics of natural languages in the framework of complex networks. We begin with a description of lexical networks, where the nodes are words and edges represent lexical relationship between two words such as phonetic and semantic similarity. This is followed by an overview of various networks where again the nodes are the words, but unlike the case of lexical networks, the edges represent their co-occurrences in similar context. These networks are representations of the interactions among words as governed by the grammar rules of a language. Next we discuss the properties of phonological networks, where the nodes are sub-lexical units such as phonemes or syllables. Applications of linguistic networks in Natural Language Processing (NLP) and Information Retrieval (IR) are also discussed. We conclude the survey by enumerating some open problems in the area of linguistic networks.

Cited by 16 in Semantic Scholar | Search Google Scholar

Self-organization of the Sound Inventories: Analysis and Synthesis of the Occurrence and Co-occurrence Networks of Consonantsdoi.org PDF

A Mukherjee, M Choudhury, A Basu, N Ganguly

Journal of Quantitative Linguistics 16(2):157-184, 2009

The sound inventories of the world's languages self-organize themselves giving rise to similar cross-linguistic patterns. In this work we attempt to capture this phenomenon of self-organization, which shapes the structure of the consonant inventories, through a complex network ...MORE ⇓

The sound inventories of the world's languages self-organize themselves giving rise to similar cross-linguistic patterns. In this work we attempt to capture this phenomenon of self-organization, which shapes the structure of the consonant inventories, through a complex network approach. For this purpose we define the occurrence and co-occurrence networks of consonants and systematically study some of their important topological properties. A crucial observation is that the occurrence as well as the co-occurrence of consonants across languages follow a power law distribution. This property is arguably a consequence of the principle of preferential attachment. In order to support this argument we propose a synthesis model which reproduces the degree distribution for the networks to a close approximation. We further observe that the co-occurrence network of consonants show a high degree of clustering and subsequently refine our synthesis model in order to incorporate this property. Finally, we discuss how preferential attachment manifests itself through the evolutionary nature of language.

Cited by 9 in Semantic Scholar | Search Google Scholar

Discovering Global Patterns in Linguistic Networks through Spectral Analysis: A Case Study of the Consonant InventoriesPDF

A Mukherjee, M Choudhury, R Kannan

EACL 2009, pages 585-593, 2009

Recent research has shown that language and the socio-cognitive phenomena associated with it can be aptly modeled and visualized through networks of linguistic entities. However, most of the existing works on linguistic networks focus only on the local properties of the networks. ...MORE ⇓

Recent research has shown that language and the socio-cognitive phenomena associated with it can be aptly modeled and visualized through networks of linguistic entities. However, most of the existing works on linguistic networks focus only on the local properties of the networks. This study is an attempt to analyze the structure of languages via a purely structural technique, namely spectral analysis, which is ideally suited for discovering the global correlations in a network. Application of this technique to PhoNet, the co-occurrence network of consonants, not only reveals several natural linguistic principles governing the structure of the consonant inventories, but is also able to quantify their relative importance. We believe that this powerful technique can be successfully applied, in general, to study the structure of natural languages.

Cited by 6 in Semantic Scholar | Search Google Scholar

Modeling the Structure and Dynamics of the Consonant Inventories: A Complex Network ApproachPDF

A Mukherjee, M Choudhury, A Basu, N Ganguly

Proceedings of COLING-08, 2008

We study the self-organization of the consonant inventories through a complex network approach. We observe that the distribution of occurrence as well as cooccurrence of the consonants across languages follow a power-law behavior. The co-occurrence network of consonants exhibits ...MORE ⇓

We study the self-organization of the consonant inventories through a complex network approach. We observe that the distribution of occurrence as well as cooccurrence of the consonants across languages follow a power-law behavior. The co-occurrence network of consonants exhibits a high clustering coefficient. We propose four novel synthesis models for these networks (each of which is a refinement of the earlier) so as to successively match with higher accuracy (a) the above mentioned topological properties as well as (b) the linguistic property of feature economy exhibited by the consonant inventories. We conclude by arguing that a possible interpretation of this mechanism of network growth is the process of child language acquisition. Such models essentially increase our understanding of the structure of languages that is influenced by their evolutionary dynamics and this, in turn, can be extremely useful for building future NLP applications.

Cited by 3 in Semantic Scholar | Search Google Scholar

Rediscovering the Co-occurrence Principles of Vowel Inventories: A Complex Network Approachdoi.org PDF

A Mukherjee, M Choudhury, A Basu, N Ganguly, SR Chowdhury

Advances in Complex Systems 11(3):371-392, 2008

In this work, we attempt to capture patterns of co-occurrence across vowel systems and at the same time figure out the nature of the force leading to the emergence of such patterns. For this purpose we define a weighted network where the vowels are the nodes and an edge between ...MORE ⇓

In this work, we attempt to capture patterns of co-occurrence across vowel systems and at the same time figure out the nature of the force leading to the emergence of such patterns. For this purpose we define a weighted network where the vowels are the nodes and an edge between two nodes (read vowels) signify their co-occurrence likelihood over the vowel inventories. Through this network we identify communities of vowels, which essentially reflect their patterns of co-occurrence across languages. We observe that in the assortative vowel communities the constituent nodes (read vowels) are largely uncorrelated in terms of their features and show that they are formed based on the principle of maximal perceptual contrast. However, in the rest of the communities, strong correlations are reflected among the constituent vowels with respect to their features indicating that it is the principle of feature economy that binds them together. We validate the above observations by proposing a quantitative measure of perceptual contrast as well as feature economy and subsequently comparing the results obtained due to these quantifications with those where we assume that the vowel inventories had evolved just by chance.

Cited by 1 in Semantic Scholar | Search Google Scholar

Computational Models of Real World Phonological ChangePDF

M Choudhury

Indian Institute of Technology Kharagpur, 2007

As you are reading these words, millions of neurons are triggered in your brain; through a mysterious coordination and combination of electrical signals, they paint the meaning of the sentence on the canvas of the mind. Despite such a complex underlying mechanism, we ...

Semantic Scholar | Search Google Scholar

Evolution, optimization and language change: the case of Bengali verb inflectionsPDF

M Choudhury, V Jalan, S Sarkar, A Basu

Proceedings of Ninth Meeting of the ACL Special Interest Group in Computational Morphology and Phonology, 2007

The verb inflections of Bengali underwent a series of phonological change between 10th and 18th centuries, which gave rise to several modern dialects of the language. In this paper, we offer a functional explanation for this change by quantifying the functional pressures of ease ...MORE ⇓

The verb inflections of Bengali underwent a series of phonological change between 10th and 18th centuries, which gave rise to several modern dialects of the language. In this paper, we offer a functional explanation for this change by quantifying the functional pressures of ease of articulation, perceptual contrast and learnability through objective functions or constraints, or both. The multi-objective and multi-constraint optimization problem has been solved through genetic algorithm, whereby we have observed the emergence of Pareto-optimal dialects in the system that closely resemble some of the real ones.

Cited by 4 in Semantic Scholar | Search Google Scholar

Modeling the Co-occurrence Principles of the Consonant Inventories: A Complex Network Approachdoi.org PDF

A Mukherjee, M Choudhury, A Basu, N Ganguly

International Journal of Modern Physics C 18(2):281-295, 2007

Speech sounds of the languages all over the world show remarkable patterns of cooccurrence. In this work, we attempt to automatically capture the patterns of cooccurrence of the consonants across languages and at the same time figure out the nature of the force leading to the ...MORE ⇓

Speech sounds of the languages all over the world show remarkable patterns of cooccurrence. In this work, we attempt to automatically capture the patterns of cooccurrence of the consonants across languages and at the same time figure out the nature of the force leading to the emergence of such patterns. For this purpose we define a weighted network where the consonants are the nodes and an edge between two nodes (read consonants) signify their co-occurrence likelihood over the consonant inventories. Through this network we identify communities of consonants that essentially reflect their patterns of co-occurrence across languages. We test the goodness of the communities and observe that the constituent consonants frequently occur in such groups in real languages also. Interestingly, the consonants forming these communities reflect strong correlations in terms of their features, which indicate that the principle of feature economy acts as a driving force towards community formation. In order to measure the strength of this force we propose an information theoretic definition of feature economy and show that indeed the feature economy exhibited by the consonant communities are substantially better than those if the consonant inventories had evolved just by chance.

Cited by 10 in Semantic Scholar | Search Google Scholar

Redundancy ratio: an invariant property of the consonant inventories of the world's languagesPDF

A Mukherjee, M Choudhury, A Basu, N Ganguly

Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, 2007

In this paper, we put forward an information theoretic definition of the redundancy that is observed across the sound inventories of the world's languages. Through rigorous statistical analysis, we find that this redundancy is an invariant property of the consonant inventories. ...MORE ⇓

In this paper, we put forward an information theoretic definition of the redundancy that is observed across the sound inventories of the world's languages. Through rigorous statistical analysis, we find that this redundancy is an invariant property of the consonant inventories. The statistical analysis further unfolds that the vowel inventories do not exhibit any such property, which in turn points to the fact that the organizing principles of the vowel and the consonant inventories are quite different in nature.

Cited by 4 in Semantic Scholar | Search Google Scholar

Emergence of community structures in vowel inventories: an analysis based on complex networksPDF

A Mukherjee, M Choudhury, A Basu, N Ganguly

Proceedings of Ninth Meeting of the ACL Special Interest Group in Computational Morphology and Phonology, 2007

In this work, we attempt to capture patterns of co-occurrence across vowel systems and at the same time figure out the nature of the force leading to the emergence of such patterns. For this purpose we define a weighted network where the vowels are the nodes and an edge between ...MORE ⇓

In this work, we attempt to capture patterns of co-occurrence across vowel systems and at the same time figure out the nature of the force leading to the emergence of such patterns. For this purpose we define a weighted network where the vowels are the nodes and an edge between two nodes (read vowls) signify their co-occurrence likelihood over the vowel inventories. Through this network we identify communities of vowels, which essentially reflect their patterns of co-occurrence across languages. We observe that in the assortative vowel communities the constituent nodes (read vowels) are largely uncorrelated in terms of their features indicating that they are formed based on the principle of maximal perceptual contrast. However, in the rest of the communities, strong correlations are reflected among the constituent vowels with respect to their features indicating that it is the principle of feature economy that binds them together.

Cited by 4 in Semantic Scholar | Search Google Scholar

Analysis and Synthesis of the Distribution of Consonants over Languages: A Complex Network ApproachPDF

M Choudhury, A Mukherjee, A Basu, N Ganguly

COLING-ACL06, 2006

Cross-linguistic similarities are reflected by the speech sound systems of languages all over the world. In this work we try to model such similarities observed in the consonant inventories, through a complex bipartite network. We present a systematic study of some of the ...MORE ⇓

Cross-linguistic similarities are reflected by the speech sound systems of languages all over the world. In this work we try to model such similarities observed in the consonant inventories, through a complex bipartite network. We present a systematic study of some of the appealing features of these inventories with the help of the bipartite network. An important observation is that the occurrence of consonants follows a two regime power law distribution. We find that the consonant inventory size distribution together with the principle of preferential attachment are the main reasons behind the emergence of such a two regime behavior. In order to further support our explanation we present a synthesis model for this network based on the general theory of preferential attachment.

Cited by 15 in Semantic Scholar | Search Google Scholar

Multi-Agent Simulation of Emergence of Schwa Deletion Pattern in HindiPDF

M Choudhury, A Basu, S Sarkar

Journal of Artificial Societies and Social Simulation 9(2), 2006

Recently, there has been a revival of interest in multi-agent simulation techniques for exploring the nature of language change. However, a lack of appropriate validation of simulation experiments against real language data often calls into question the general applicability of ...MORE ⇓

Recently, there has been a revival of interest in multi-agent simulation techniques for exploring the nature of language change. However, a lack of appropriate validation of simulation experiments against real language data often calls into question the general applicability of these methods in modeling realistic language change. We try to address this issue here by making an attempt to model the phenomenon of schwa deletion in Hindi through a multi-agent simulation framework. The pattern of Hindi schwa deletion and its diachronic nature are well studied, not only out of general linguistic inquiry, but also to facilitate Hindi grapheme-to-phoneme conversion, which is a preprocessing step to text-to-speech synthesis. We show that under certain conditions, the schwa deletion pattern observed in modern Hindi emerges in the system from an initial state of no deletion. The simulation framework described in this work can be extended to model other phonological changes as well.

Cited by 5 in Semantic Scholar | Search Google Scholar

A Diachronic Approach for Schwa Deletion in Indo Aryan LanguagesPDF

M Choudhury, A Basu, S Sarkar

Proceedings of the Seventh Meeting of the ACL Special Interest Group in Computational Phonology, pages 20--26, 2004

Schwa deletion is an important issue in grapheme-to-phoneme conversion for Indo- Aryan languages (IAL). In this paper, we describe a syllable minimization based algorithm for dealing with this that outperforms the existing methods in terms of efficiency and accuracy. The ...MORE ⇓

Schwa deletion is an important issue in grapheme-to-phoneme conversion for Indo- Aryan languages (IAL). In this paper, we describe a syllable minimization based algorithm for dealing with this that outperforms the existing methods in terms of efficiency and accuracy. The algorithm is motivated by the fact that deletion of schwa is a diachronic and sociolinguistic phenomenon that facilitates faster communication through syllable economy. The contribution of the paper is not just a better algorithm for schwa deletion; rather we describe here a constrained optimization based framework that can partly model the evolution of languages, and hence, can be used for solving many problems in computational linguistics that call for diachronic explanations.

Cited by 15 in Semantic Scholar | Search Google Scholar

Language Evolution and Computation Bibliography