Language Evolution and Computation Bibliography

Our site (www.isrl.uiuc.edu/amag/langev) retired, please use https://langev.com instead.
Journal :: Journal of Quantitative Linguistics
2016
Journal of Quantitative Linguistics 23(2):133-153, 2016
Vocalizations, and less often gestures, have been the object of linguistic research for decades. However, the development of a general theory of communication with human language as a particular case requires a clear understanding of the organization of communication through ...MORE ⇓
Vocalizations, and less often gestures, have been the object of linguistic research for decades. However, the development of a general theory of communication with human language as a particular case requires a clear understanding of the organization of communication through other means. Infochemicals are chemical compounds that carry information and are employed by small organisms that cannot emit acoustic signals of an optimal frequency to achieve successful communication. Here, we investigate the distribution of infochemicals across species when they are ranked by their degree or the number of species with which they are associated (because they produce them or are sensitive to them). We evaluate the quality of the fit of different functions to the dependency between degree and rank by means of a penalty for the number of parameters of the function. Surprisingly, a double Zipf (a Zipf distribution with two regimes, each with a different exponent) is the model yielding the best fit although it is the function with the largest number of parameters. This suggests that the 2 HERNÁNDEZ-FERNÁNDEZ & FERRER-I-CANCHO worldwide repertoire of infochemicals contains a core which is shared by many species and is reminiscent of the core vocabularies found for human language in dictionaries or large corpora.
2012
Journal of Quantitative Linguistics 19(2):95--120, 2012
Abstract We apply to the task of linguistic phylogenetic inference a successful cognate identification learning model based on point accepted mutation (PAM)-like matrices. We train our system and we employ the learned parameters for measuring the lexical distance ...
2010
Modeling the Redundancy of Human Speech Sound Inventories: An Information Theoretic Approach
Journal of Quantitative Linguistics, 2010
In traditional generative linguistics sounds of a language are represented as bundle of binary valued features. The sounds used in a language are not randomly chosen from a universal repository of phonemes, but are known to be correlated in terms of the features they use. ...MORE ⇓
In traditional generative linguistics sounds of a language are represented as bundle of binary valued features. The sounds used in a language are not randomly chosen from a universal repository of phonemes, but are known to be correlated in terms of the features they use. Discovery of these correlation patterns and organizational principles behind the structure of sound inventories has been one of the classic problems in phonology. In this work, we show that the amount of redundancy present in the sound inventory of a language, which is an information theoretic measure reflecting the ratio of the number of distinctive features used in the language to that of the minimum number of features required to distinguish between the sounds present in the language, lies within a very narrow range irrespective of the factors such as the size of the inventory, the language family and the typology. This is a hitherto unreported significant observation that points to a universal structural property of the sound inventories of human languages. This property might be an outcome of self-organization of the sound inventories through the processes of language acquisition and change, or of the way in which phonemes are represented in generative phonology.
2009
Journal of Quantitative Linguistics 16(2):157-184, 2009
The sound inventories of the world's languages self-organize themselves giving rise to similar cross-linguistic patterns. In this work we attempt to capture this phenomenon of self-organization, which shapes the structure of the consonant inventories, through a complex network ...MORE ⇓
The sound inventories of the world's languages self-organize themselves giving rise to similar cross-linguistic patterns. In this work we attempt to capture this phenomenon of self-organization, which shapes the structure of the consonant inventories, through a complex network approach. For this purpose we define the occurrence and co-occurrence networks of consonants and systematically study some of their important topological properties. A crucial observation is that the occurrence as well as the co-occurrence of consonants across languages follow a power law distribution. This property is arguably a consequence of the principle of preferential attachment. In order to support this argument we propose a synthesis model which reproduces the degree distribution for the networks to a close approximation. We further observe that the co-occurrence network of consonants show a high degree of clustering and subsequently refine our synthesis model in order to incorporate this property. Finally, we discuss how preferential attachment manifests itself through the evolutionary nature of language.
Journal of Quantitative Linguistics 16(3):256-273, 2009
This article investigates probability distributions of the dependency relation extracted from a Chinese dependency treebank. The author shows the frequency distributions of dependency type, of word class both as a dependent and a governor, of verb as a governor, and of noun as a ...MORE ⇓
This article investigates probability distributions of the dependency relation extracted from a Chinese dependency treebank. The author shows the frequency distributions of dependency type, of word class both as a dependent and a governor, of verb as a governor, and of noun as a dependent. The fitting results reveal that most of the investigated distributions are excellently fitted with a modified right-truncated Zipf-Alekseev distribution. In the analysis of exponential regressions, most of the determination coefficients R2 are very good, which is an alternative evidence that the investigated distributions are fitted well.
2002
Journal of Quantitative Linguistics 9:35-47, 2002
Certain word types of natural languages - conjunctions, articles, prepositions and some verbs - have a very low or very grammatically marked semantic contribution. They are usually named functional categories or relational items. Recently, the possibility of considering ...MORE ⇓
Certain word types of natural languages - conjunctions, articles, prepositions and some verbs - have a very low or very grammatically marked semantic contribution. They are usually named functional categories or relational items. Recently, the possibility of considering prepositions as simple parametrical variations of semantic features instead of categorial features or as the irrelevance of such categorial features has been pointed out. The discussion about such particles has been and still is widespread and controversial. Nonetheless, there is no quantitative evidence of such semantic weakness and no satisfactory evidence against the coexistence of categorial requirements and the fragility of the semantic aspects. This study aims to quantify the semantic contribution of particles and presents some corpora-based results for English that suggest that such weakness and its relational uncertainty come from the categorial irrelevance mentioned before.
2001
Journal of Quantitative Linguistics 8(3):165-173, 2001
Zipf's law states that the frequency of a word is a power function of its rank. The exponent of the power is usually accepted to be close to (-)1. Great deviations between the predicted and real number of different words of a text, disagreements between the predicted and real ...MORE ⇓
Zipf's law states that the frequency of a word is a power function of its rank. The exponent of the power is usually accepted to be close to (-)1. Great deviations between the predicted and real number of different words of a text, disagreements between the predicted and real exponent of the probability density function and statistics on a big corpus, make evident that word frequency as a function of the rank follows two different exponents, \approx (-)1 for the first regime and \approx (-)2 for the second. The implications of the change in exponents for the metrics of texts and for the origins of complex lexicons are analyzed.
1998
Journal of Quantitative Linguistics 5(3):240-245, 1998
Synergetic models of language structure predict that the length of a word will depend upon various parameters such as its frequency and the number of phonemes in the language. This prediction has been used to explain word length differences within languages, but less often to ...MORE ⇓
Synergetic models of language structure predict that the length of a word will depend upon various parameters such as its frequency and the number of phonemes in the language. This prediction has been used to explain word length differences within languages, but less often to explain the differences between languages. Here I show that average word length across 12 West African languages is related to the size of the phonological inventory. This is an apparent example of the adaptation of language structure to the efficient communication of information. The hypothesised mechanism by which the relationship evolves are outlined.