Language Evolution and Computation Bibliography

Our site (www.isrl.uiuc.edu/amag/langev) retired, please use https://langev.com instead.
Journal :: Physica A: Statistical Mechanics and its Applications
2012
The spatial distribution of clusters and the formation of mixed languages in bilingual competition
Physica A: Statistical Mechanics and its Applications, 2012
We use cellular automata simulation methods to study the competition between two languages (language A and B). We assume each of the two languages consists of F independent features and define an individual as two F-length “identity level” integer ...
Chinese lexical networks: The structure, function and formation
Physica A: Statistical Mechanics and its Applications, 2012
In this paper Chinese phrases are modeled using complex networks theory. We analyze statistical properties of the networks and find that phrase networks display some important features: not only small world and the power-law distribution, but also hierarchical ...
Modeling language evolution: Aromanian, an endangered language in Greece
Physica A: Statistical Mechanics and its Applications, 2012
Time evolution of the relative density of speakers of an endangered language, Aromanian, which is spoken by a bilingual community in North-Western Greece, is approached theoretically by means of a two-state model and a three-state model. The same prestige ...
2011
Physica A: Statistical Mechanics and its Applications 390(7):1370-1380, 2011
The study of properties of speech sound systems is of great significance in understanding the human cognitive mechanism and the working principles of speech sound systems. Some properties of speech sound systems, such as the listener-oriented feature and the talker-oriented ...MORE ⇓
The study of properties of speech sound systems is of great significance in understanding the human cognitive mechanism and the working principles of speech sound systems. Some properties of speech sound systems, such as the listener-oriented feature and the talker-oriented feature, have been unveiled with the statistical study of phonemes in human languages and the research of the interrelations between human articulatory gestures and the corresponding acoustic parameters. With all the phonemes of speech sound systems treated as a coherent whole, our research, which focuses on the dynamic properties of speech sound systems in operation, investigates some statistical parameters of Chinese phoneme networks based on real text and dictionaries. The findings are as follows: phonemic networks have high connectivity degrees and short average distances; the degrees obey normal distribution and the weighted degrees obey power law distribution; vowels enjoy higher priority than consonants in the actual operation of speech sound systems; the phonemic networks have high robustness against targeted attacks and random errors. In addition, for investigating the structural properties of a speech sound system, a statistical study of dictionaries is conducted, which shows the higher frequency of shorter words and syllables and the tendency that the longer a word is, the shorter the syllables composing it are. From these structural properties and dynamic properties one can derive the following conclusion: the static structure of a speech sound system tends to promote communication efficiency and save articulation effort while the dynamic operation of this system gives preference to reliable transmission and easy recognition. In short, a speech sound system is an effective, efficient and reliable communication system optimized in many aspects.
2009
Influence of geography on language competition
Physica A: Statistical Mechanics and its Applications 388(2):174--186, 2009
Competition between languages or cultural traits diffusing in the same geographical area is studied combining the model of Abrams and Strogatz with a model of human dispersal on an inhomogeneous substrate. Also, the effect of population growth is discussed. It is shown ...
Physica A: Statistical Mechanics and its Applications 388(5):732-746, 2009
We invoke the Tsallis entropy formalism, a nonextensive entropy measure, to include some degree of non-locality in a neural network that is used for simulation of novel word learning in adults. A generalization of the gradient descent dynamics, realized via nonextensive cost ...MORE ⇓
We invoke the Tsallis entropy formalism, a nonextensive entropy measure, to include some degree of non-locality in a neural network that is used for simulation of novel word learning in adults. A generalization of the gradient descent dynamics, realized via nonextensive cost functions, is used as a learning rule in a simple perceptron. The model is first investigated for general properties, and then tested against the empirical data, gathered from simple memorization experiments involving two populations of linguistically different subjects. Numerical solutions of the model equations corresponded to the measured performance states of human learners. In particular, we found that the memorization tasks were executed with rather small but population-specific amounts of nonextensivity, quantified by the entropic index q. Our findings raise the possibility of using entropic nonextensivity as a means of characterizing the degree of complexity of learning in both natural and artificial systems.
Physica A: Statistical Mechanics and its Applications 388(14):2874-2879, 2009
Human history leaves fingerprints in human languages. Little is known over language evolution and its study is of great importance. Here, we construct a simple stochastic model and compare its results to statistical data of real languages. The model is based on the recent ...MORE ⇓
Human history leaves fingerprints in human languages. Little is known over language evolution and its study is of great importance. Here, we construct a simple stochastic model and compare its results to statistical data of real languages. The model is based on the recent findings that language changes occur independently of the population size. We find agreement with the data additionally assuming that languages may be distinguished by having at least one among a finite, small number of different features. This finite set is used also in order to define the distance between two languages, similarly to linguistics tradition since Swadesh.
Physica A: Statistical Mechanics and its Applications 388(17):3615-3620, 2009
The naming game model characterizes the main evolutionary features Of languages or more generally of communication systems. Very recently, the combination of complex networks and the naming game has received Much attention and the influences of various topological properties on ...MORE ⇓
The naming game model characterizes the main evolutionary features Of languages or more generally of communication systems. Very recently, the combination of complex networks and the naming game has received Much attention and the influences of various topological properties on the corresponding dynamical behavior have been widely studied. In this paper, we investigate the naming game on small-world geographical networks. The small-world geographical networks are constructed by randomly adding links to two-dimensional regular lattices, and it is found that the convergence time is a nonmonotonic function of the geographical distance of randomly added shortcuts. This phenomenon indicates that. although a long geographical distance of the added shortcuts favors consensus achievement, too long a geographical distance of the added shortcuts inhibits the convergence process, making it even slower than the moderates.
Comparison of co-occurrence networks of the chinese and english languages
Physica A: Statistical Mechanics and its Applications 388(23):4901--4909, 2009
Co-occurrence networks of Chinese characters and words, and of English words, are constructed from collections of Chinese and English articles, respectively. Four types of collections are considered, namely, essays, novels, popular science articles, and news ...
2008
Physica A: Statistical Mechanics and its Applications 387(12):3039-3047, 2008
Chinese is spoken by the largest number of people in the world, and it is regarded as one of the most important languages. In this paper, we explore the statistical properties of Chinese language networks (CLNs) within the framework of complex network theory. Based on one of the ...MORE ⇓
Chinese is spoken by the largest number of people in the world, and it is regarded as one of the most important languages. In this paper, we explore the statistical properties of Chinese language networks (CLNs) within the framework of complex network theory. Based on one of the largest Chinese corpora, i.e. People's Daily Corpus, we construct two networks (CLN1 and CLN2) from two different respects, with Chinese words as nodes. In CLN1, a link between two nodes exists if they appear next to each other in at least one sentence; in CLN2, a link represents that two nodes appear simultaneously in a sentence. We show that both networks exhibit small-world effect, scale-free structure, hierarchical organization and disassortative mixing. These results indicate that in many topological aspects Chinese language shapes complex networks with organizing principles similar to other previously studied language systems, which shows that different languages may have some common characteristics in their evolution processes. We believe that our research may shed some new light into the Chinese language and find some potentially significant implications.
Physica A: Statistical Mechanics and its Applications 387(12):3048-3058, 2008
This paper proposes how to build a syntactic network based on syntactic theory and presents some statistical properties of Chinese syntactic dependency networks based on two Chinese treebanks with different genres. The results show that the two syntactic networks are small-world ...MORE ⇓
This paper proposes how to build a syntactic network based on syntactic theory and presents some statistical properties of Chinese syntactic dependency networks based on two Chinese treebanks with different genres. The results show that the two syntactic networks are small-world networks, and their degree distributions obey a power law. The finding, that the two syntactic networks have the same diameter and different average degrees, path lengths, clustering coefficients and power exponents, can be seen as an indicator that complexity theory can work as a means of stylistic study. The paper links the degree of a vertex with a valency of a word, the small world with the minimized average distance of a language, that reinforces the explanations of the findings from linguistics.
Physica A: Statistical Mechanics and its Applications 387(13):3242-3252, 2008
The standard three-state voter model is enlarged by including the outside pressure favouring one of the three language choices and by adding some biased internal random noise. The Monte Carlo simulations are motivated by states with the population divided into three groups of ...MORE ⇓
The standard three-state voter model is enlarged by including the outside pressure favouring one of the three language choices and by adding some biased internal random noise. The Monte Carlo simulations are motivated by states with the population divided into three groups of various affinities to each other. We show the crucial influence of the boundaries for moderate lattice sizes like 500 x 500. By removing the fixed boundary at one side, we demonstrate that this can lead to the victory of one single choice. Noise in contrast stabilizes the choices of all three populations. In addition, we compute the persistence probability, i.e., the number of sites who have never changed their opinion during the simulation, and we consider the case of ``rigid-minded'' decision makers.
Physica A: Statistical Mechanics and its Applications 387(22):5597-5601, 2008
We have recently investigated the evolution of linguistic diversity by means of a simple spatial model that considers selective geographic colonization, linguistic anomalous diffusion and mutation. In the model, regions of the lattice are characterized by the amount of resources ...MORE ⇓
We have recently investigated the evolution of linguistic diversity by means of a simple spatial model that considers selective geographic colonization, linguistic anomalous diffusion and mutation. In the model, regions of the lattice are characterized by the amount of resources available to populations which are going to colonize the region. In that approach, the resources were ascribed in a randomly and uncorrelated way. Here, we extend the previous model and introduce a degree of correlation for the resource landscape. A change of the qualitative scenario is observed for high correlation, where the increase of the linguistic diversity on area is faster than for low correlated landscapes. For low correlated landscapes, the dependence of diversity on area shows two scaling regimes, while we observe the rising of another scaling region for high correlated landscapes.
Physica A: Statistical Mechanics and its Applications 387(2-3):661-666, 2008
The phenomenon of human language is widely studied from various points of view. It is interesting not only for social scientists, antropologists or philosophers, but also for those, interested in the network dynamics. In several recent papers word web, or language as a graph has ...MORE ⇓
The phenomenon of human language is widely studied from various points of view. It is interesting not only for social scientists, antropologists or philosophers, but also for those, interested in the network dynamics. In several recent papers word web, or language as a graph has been investigated [R.F. Cancho, R. Sole, The small world of human language, Proc. R. Soc. London B 268 (2001) 2261-2265; A.E. Motter, P.S. de Moura, Lai Ying-Cheng, P. Dasgupta, Topology of the conceptual network of language, Phys. Rev. E 65 (2002) R 065102; M. Steyvers, J.B. Tenenbaum, The large-scale structure of semantic networks: Statistical analysis and a model of semantic growth, Cogn. Sci. 29 (2005) 41-78]. In this paper I revise recent studies of syntactical word web [R.F. Cancho, R. Sole, The small world of human language, Proc. R. Soc. London B 268 (2001) 2261-2265; S.N. Dorogovtsev, J.F.F. Mendes, Language as an evolving word web, Proc. R. Soc. London B 268 (2001) 2603-2606]. I present a model of growing network in which such processes as node addition, edge rewiring and new link creation are taken into account. I argue, that this model is a satisfactory minimal model explaining measured data [R.F. Cancho, R. Sole, The small world of human language, Proc. R. Soc. London B 268 (2001) 2261-2265; M. Markosova, P. Nather, Language as a graph, in: V. Kvasnicka, P. Trebaticky, J. Pospichal (Eds.), Mind, Intelligence and Life, Kelemen, STU Bratislava, 2007, pp. 298-307 (in Slovak)].
2007
Strong correlations between text quality and complex networks features
Physica A: Statistical Mechanics and its Applications 373:811--820, 2007
Concepts of complex networks have been used to obtain metrics that were correlated to text quality established by scores assigned by human judges. Texts produced by high-school students in Portuguese were represented as scale-free networks (word adjacency model), ...
Physica A: Statistical Mechanics and its Applications 374(2):835-842, 2007
The differential equation of Abrams and Strogatz for the competition between two languages is compared with agent-based Monte Carlo simulations for fully connected networks as well as for lattices in one, two and three dimensions, with up to 10(9) agents. In the case of socially ...MORE ⇓
The differential equation of Abrams and Strogatz for the competition between two languages is compared with agent-based Monte Carlo simulations for fully connected networks as well as for lattices in one, two and three dimensions, with up to 10(9) agents. In the case of socially equivalent languages, agent-based models and a mean-field approximation give grossly different results.
Physica A: Statistical Mechanics and its Applications 379(2):661-664, 2007
Using the Schulze model for Monte Carlo simulations of language competition, we include a barrier between the top half and the bottom half of the lattice. We check under which conditions two different languages evolve as dominating in the two halves.
Physica A: Statistical Mechanics and its Applications 379(2):665-671, 2007
We examine the evolution of the vocabulary of a group of individuals (linguistic agents) on a scale-free network, using Monte Carlo simulations and assumptions from evolutionary game theory. It is known that when the agents are arranged in a two-dimensional lattice structure and ...MORE ⇓
We examine the evolution of the vocabulary of a group of individuals (linguistic agents) on a scale-free network, using Monte Carlo simulations and assumptions from evolutionary game theory. It is known that when the agents are arranged in a two-dimensional lattice structure and interact by diffusion and encounter, then their final vocabulary size is the maximum possible. Knowing all available words is essential in order to increase the probability to 'survive' by effective reproduction. On scale-free networks we find a different result. It is not necessary to learn the entire vocabulary available. Survival chances are increased by using the vocabulary of the 'hubs' (nodes with high degree). The existence of the 'hubs' in a scale-free network is the source of an additional important fitness generating mechanism. (C) 2007 Elsevier B.V. All rights reserved.
Chinese character structure analysis based on complex networks
Physica A: Statistical Mechanics and its Applications 380:629--638, 2007
In this paper, Chinese character networks are modelled using complex networks theory. We analyze statistical properties of the networks and find that character networks also display two important features as other real networks, ie, small-world feature and the non-Poisson ...
2006
Physica A: Statistical Mechanics and its Applications 361(1):355-360, 2006
In this work we study the dynamics of language competition. In Abrams and Strogatz [Modeling the dynamics of language death, Nature 424 (2003) 900], the extinction of one of the competing languages is predicted, although in some case the coexistence occurs. The preservation of ...MORE ⇓
In this work we study the dynamics of language competition. In Abrams and Strogatz [Modeling the dynamics of language death, Nature 424 (2003) 900], the extinction of one of the competing languages is predicted, although in some case the coexistence occurs. The preservation of both languages was explained by Patriarca and Leppanen [Modeling language competition, Physica A 338 (2004) 296] by introducing the existence of two disjoint zones where each language is predominant. However, their results cannot explain the survivance of both languages in only one zone of competition. In this work we discuss their results and propose a new alternative model of Lotka-Volterra type in order to explain the coexistence of two languages.
Physica A: Statistical Mechanics and its Applications 361(1):361-370, 2006
Here we describe how some important scaling laws observed in the distribution of languages on Earth can emerge from a simple computer simulation. The proposed language dynamics includes processes of selective geographic colonization, linguistic anomalous diffusion and mutation, ...MORE ⇓
Here we describe how some important scaling laws observed in the distribution of languages on Earth can emerge from a simple computer simulation. The proposed language dynamics includes processes of selective geographic colonization, linguistic anomalous diffusion and mutation, and interaction among populations that occupy different regions. It is found that the dependence of the linguistic diversity on the area after colonization displays two power law regimes, both described by critical exponents which are dependent on the mutation probability. Most importantly for the future prospect of world's population, our results show that the linguistic diversity always decrease to an asymptotic very small value if large areas and sufficiently long times of interaction among populations are considered.
Physica A: Statistical Mechanics and its Applications 366:495-502, 2006
We use the formulation of equilibrium statistical mechanics in order to study some important characteristics of language. Using a simple expression for the Hamiltonian of a language system, which is directly implied by the Zipf law, we are able to explain several characteristic ...MORE ⇓
We use the formulation of equilibrium statistical mechanics in order to study some important characteristics of language. Using a simple expression for the Hamiltonian of a language system, which is directly implied by the Zipf law, we are able to explain several characteristic features of human language that seem completely unrelated, such as the universality of the Zipf exponent, the vocabulary size of children, the reduced communication abilities of people suffering from schizophrenia, etc. While several explanations are necessarily only qualitative at this stage, we have, nevertheless, been able to derive a formula for the vocabulary size of children as a function of age, which agrees rather well with experimental data.
Physica A: Statistical Mechanics and its Applications 368(1):257-261, 2006
We have recently introduced a simple spatial computer simulation model to study the evolution of the linguistic diversity. The model considers processes of selective geographic colonization, linguistic anomalous diffusion and mutation. In the approach, we ascribe to each language ...MORE ⇓
We have recently introduced a simple spatial computer simulation model to study the evolution of the linguistic diversity. The model considers processes of selective geographic colonization, linguistic anomalous diffusion and mutation. In the approach, we ascribe to each language a fitness function which depends on the number of people that speak that language. Here, we extend the aforementioned model to examine the role of saturation of the fitness on the language dynamics. We found that the dependence of the linguistic diversity on the area after colonization displays a power law regime with a nontrivial exponent in very good agreement with the measured exponent associated with the actual distribution of languages on the Earth.
Physica A: Statistical Mechanics and its Applications 370(2):808-816, 2006
We use the detrended fluctuation analysis (DFA) and the Grassberger-Proccacia analysis (GP) methods in order to study language characteristics. Despite that we construct our signals using only word lengths or word frequencies, excluding in this way huge amount of information from ...MORE ⇓
We use the detrended fluctuation analysis (DFA) and the Grassberger-Proccacia analysis (GP) methods in order to study language characteristics. Despite that we construct our signals using only word lengths or word frequencies, excluding in this way huge amount of information from language, the application of GP analysis indicates that linguistic signals may be considered as the manifestation of a complex system of high dimensionality, different from random signals or systems of low dimensionality such as the Earth climate. The DFA method is additionally able to distinguish a natural language signal from a computer code signal. This last result may be useful in the field of cryptography.
Physica A: Statistical Mechanics and its Applications 371(2):719-724, 2006
The bit-string model of Schulze and Stauffer (2005) is applied to non-equilibrium situations and then gives better agreement with the empirical distribution of language sizes. Here the size is the number of people having this language as mother tongue. In contrast, when ...MORE ⇓
The bit-string model of Schulze and Stauffer (2005) is applied to non-equilibrium situations and then gives better agreement with the empirical distribution of language sizes. Here the size is the number of people having this language as mother tongue. In contrast, when equilibrium is combined with irreversible mutations of languages, one language always dominates and is spoken by at least 80 percent of the population.
2005
Physica A: Statistical Mechanics and its Applications 345(1-2):275-284, 2005
Here, assuming a general communication model where objects map to signals, a power function for the distribution of signal frequencies is derived. The model relies on the satisfaction of the receiver (hearer) communicative needs when the entropy of the number of objects per ...MORE ⇓
Here, assuming a general communication model where objects map to signals, a power function for the distribution of signal frequencies is derived. The model relies on the satisfaction of the receiver (hearer) communicative needs when the entropy of the number of objects per signal is maximized. Evidence of power distributions in a linguistic context (some of them with exponents clearly different from the typical \beta \approximate 2 of Zipf's law) is reviewed and expanded. We support the view that Zipf's law reflects some sort of optimization but following a novel realistic approach where signals (e.g. words) are used according to the objects (e.g. meanings) they are linked to. Our results strongly suggest that many systems in nature use non-trivial strategies for easing the interpretation of a signal. Interestingly, constraining just the number of interpretations of signals does not lead to scaling.
Physica A: Statistical Mechanics and its Applications 353:595-612, 2005
We use Monte Carlo simulations and assumptions from evolutionary game theory in order to study the evolution of words and the population dynamics of a system made of two interacting species which initially speak two different languages. The species are characterized by their ...MORE ⇓
We use Monte Carlo simulations and assumptions from evolutionary game theory in order to study the evolution of words and the population dynamics of a system made of two interacting species which initially speak two different languages. The species are characterized by their identity, vocabulary, and have different initial fitness, i.e. reproduction capability. We investigate how different initial fitness affects the vocabulary of the species or the population dynamics by leading to a permanent populational advantage. We further find that the spatial distributions of the species may cause the system to exhibit pattern formation or segregation. We show that an initial fitness advantage, even though very quickly balanced, leads to better spatial arrangement and enhances survival probabilities of the species. In most cases the system will arrive at a final state where both languages coexist. However, in cases where one species greatly outnumbers the other in population and fitness, then only one species survives with its 'final' language having a slightly richer vocabulary than its initial language. Thus, our results offer an explanation for the existence and origin of synonyms in spoken languages.
Physica A: Statistical Mechanics and its Applications 355(2-4):678-684, 2005
We develop a network using the syllables of the Portuguese language. In this language the syllables are close to the basic phonetic unities. The nodes of the network are the syllables. The links are established each time two syllables form part of the same word. We use two ...MORE ⇓
We develop a network using the syllables of the Portuguese language. In this language the syllables are close to the basic phonetic unities. The nodes of the network are the syllables. The links are established each time two syllables form part of the same word. We use two different data sets to perform the numerics: a Portuguese dictionary and the complete work of the most important Brazilian writer--Machado de Assis. The syllabic network shows a low distance and a high clustering coefficient when compared with an associated Erdos-Renyi graph and with an associated random network with the same distribution of connectivity. The distribution of connectivity of the syllabic network follows a power law with exponent y=~1.4 indicating complex behavior.
2004
Physica A: Statistical Mechanics and its Applications 338(1-2):296-299, 2004
We consider a model introduced recently [Nature 424(2003)900], for describing competition between two languages, which in typical situations predicts the extinction of one of them. We generalize it by introducing a spatial dependence in terms of a reaction-diffusion equation. We ...MORE ⇓
We consider a model introduced recently [Nature 424(2003)900], for describing competition between two languages, which in typical situations predicts the extinction of one of them. We generalize it by introducing a spatial dependence in terms of a reaction-diffusion equation. We show that in this generalized model both languages can survive, each mostly concentrated in a different geographical area.
1999
Physica A: Statistical Mechanics and its Applications 271(3-4):489-495, 1999
The distribution of living languages is investigated and scaling relations are found for the diversity of languages as a function of the country area and population. These results are compared with data from Ecology and from computer simulations of fragmentation dynamics where ...MORE ⇓
The distribution of living languages is investigated and scaling relations are found for the diversity of languages as a function of the country area and population. These results are compared with data from Ecology and from computer simulations of fragmentation dynamics where similar scalings appear. The language size distribution is also studied and shown to display two scaling regions: (i) one for the largest (in population) languages and (ii) another one for intermediate-size languages. It is then argued that these two classes of languages may have distinct growth dynamics, being distributed on the sets of different fractal dimensions.