Language Evolution and Computation Bibliography

Our site (www.isrl.uiuc.edu/amag/langev) retired, please use https://langev.com instead.
Partha Niyogi
2009
PNAS 106(25):10124-10129, 2009
Language acquisition maps linguistic experience, primary linguistic data (PLD), onto linguistic knowledge, a grammar. Classically, computational models of language acquisition assume a single target grammar and one PLD source, the central question being whether the target grammar ...MORE ⇓
Language acquisition maps linguistic experience, primary linguistic data (PLD), onto linguistic knowledge, a grammar. Classically, computational models of language acquisition assume a single target grammar and one PLD source, the central question being whether the target grammar can be acquired from the PLD. However, real-world learners confront populations with variation, i.e., multiple target grammars and PLDs. Removing this idealization has inspired a new class of population-based language acquisition models. This paper contrasts 2 such models. In the first, iterated learning (IL), each learner receives PLD from one target grammar but different learners can have different targets. In the second, social learning (SL), each learner receives PLD from possibly multiple targets, e. g., from 2 parents. We demonstrate that these 2 models have radically different evolutionary consequences. The IL model is dynamically deficient in 2 key respects. First, the IL model admits only linear dynamics and so cannot describe phase transitions, attested rapid changes in languages over time. Second, the IL model cannot properly describe the stability of languages over time. In contrast, the SL model leads to nonlinear dynamics, bifurcations, and possibly multiple equilibria and so suffices to model both the case of stable language populations, mixtures of more than 1 language, as well as rapid language change. The 2 models also make distinct, empirically testable predictions about language change. Using historical data, we show that the SL model more faithfully replicates the dynamics of the evolution of Middle English.
2006
MIT Press, 2006
TABLE OF CONTENTS

I The Problem 5
1 Introduction 7

II Language Learning 51
2 Language Acquisition - Induction 53
3 Language Acquisition - Linguistics I 89
4 Language Acquisition - Linguistics II 127

III Language Change 163
5 Language Change - A ...MORE ⇓

TABLE OF CONTENTS

I The Problem 5
1 Introduction 7

II Language Learning 51
2 Language Acquisition - Induction 53
3 Language Acquisition - Linguistics I 89
4 Language Acquisition - Linguistics II 127

III Language Change 163
5 Language Change - A Preliminary Model 165
6 Language Change - n Languages 197
7 An Application to Portuguese 243
8 An Application to Chinese 261
9 Cultural Evolution 285
10 Variations and Case Studies 317

IV The Origin of Language 353
11 Communicative EAeiency 355
12 Linguistic Coherence and Communicative Fitness 389
13 Linguistic Coherence and Social Learning 421

V Conclusions 459
14 Conclusions 461

Preface

This monograph explores the interplay between learning and evolution in the context of linguistic systems. For several decades now, the process of language acquisition has been conceptualized as a procedure that maps lin- guistic experience onto linguistic knowledge. If linguistic knowledge is char- acterized in computational terms as a formal grammar and the mapping procedure is algorithmic, this conceptualization admits computational and mathematical modes of inquiry into language learning. Indeed, such a view is implicit in most modern approaches to the subject in linguistics, cognitive science, and artificial intelligence.

Now learning (acquisition) is the mechanism by which language is trans- mitted from old speakers to new. Therefore, the evolution of language over generational time in linguistic populations will depend upon the learning procedure used by the individuals in it. Yet the interplay between learn- ing by the individual and evolution of the population can be quite subtle. We need tools to reason about the phenomena and elucidate the precise na- ture of the relationships involved. To this end, this monograph presents a framework in which to conduct such an analysis.

...

Quantifying the Functional Load of Phonemic Oppositions, Distinctive Features, and SuprasegmentalsPDF
Competing Models of Language Change: Evolution and Beyond, 2006
Languages convey information using several methods, and rely to different extents on different methods. The amount of reliance of a language on a method is termed the 'functional load'of the method in the language. The term goes back to early Prague School ...
2004
Artificial Intelligence 154(1-2):1-42, 2004
We consider the problem of linguistic agents that communicate with each other about a shared world. We develop a formal notion of a language as a set of probabilistic associations between form (lexical or syntactic) and meaning (semantic) that has general applicability. Using ...MORE ⇓
We consider the problem of linguistic agents that communicate with each other about a shared world. We develop a formal notion of a language as a set of probabilistic associations between form (lexical or syntactic) and meaning (semantic) that has general applicability. Using this notion, we define a natural measure of the mutual intelligibility, F(L,L'), between two agents, one using the language L and the other using L'. We then proceed to investigate three important questions within this framework: (1) Given a language L, what language L' maximizes mutual intelligibility with L? We find surprisingly that L' need not be the same as L and we present algorithms for approximating L' arbitrarily well. (2) How can one learn to optimally communicate with a user of language L when L is unknown at the outset and the learner is allowed a finite number of linguistic interactions with the user of L? We describe possible algorithms and calculate explicit bounds on the number of interactions needed. (3) Consider a population of linguistic agents that learn from each other and evolve over time. Will the community converge to a shared language and what is the nature of such a language? We characterize the evolutionarily stable states of a population of linguistic agents in a game-theoretic setting. Our analysis has significance for a number of areas in natural and artificial communication where one studies the design, learning, and evolution of linguistic communication systems.
Phase Transitions in Language EvolutionPDF
Variation and Universals in Biolinguistics, 2004
Language is transmitted from one generation to the next via learning by individuals. By taking this point of view one is able to link the linguistic behavior of successive generations and therefore study how language evolves over generational time scales. We provide a brief ...MORE ⇓
Language is transmitted from one generation to the next via learning by individuals. By taking this point of view one is able to link the linguistic behavior of successive generations and therefore study how language evolves over generational time scales. We provide a brief overview of this approach to the study of language evolution, its formalization as a dynamical system, and the analogical connections to the methodological principles of evolutionary biology. We show how the interplay between learning and evolution can be quite subtle and how phase transitions arise in many such models of language evolution. Such phase transitions may provide a suitable theoretical construct with which explanations for rapid language change or evolution may be given. Some illustrative examples are provided.
2002
The Computational Study of Diachronic LinguisticsPDF
Syntactic Effects of Morphological Change, 2002
CiteSeerX - Document Details (Isaac Councill, Lee Giles, Pradeep Teregowda): At the heart of these models is the subtle interplay between language.
Theories of cultural evolution and their application to language evolutionPDF
Linguistic Evolution through Language Acquisition: Formal and Computational Models 7.0, 2002
Nature 417:611-617, 2002
Language is our legacy. It is the main evolutionary contribution of humans, and perhaps the most interesting trait that has emerged in the past 500 million years. Understanding how darwinian evolution gives rise to human language requires the integration of formal language ...MORE ⇓
Language is our legacy. It is the main evolutionary contribution of humans, and perhaps the most interesting trait that has emerged in the past 500 million years. Understanding how darwinian evolution gives rise to human language requires the integration of formal language theory, learning theory and evolutionary dynamics. Formal language theory provides a mathematical description of language and grammar. Learning theory formalizes the task of language acquisition--it can be shown that no procedure can learn an unrestricted set of languages. Universal grammar specifies the restricted set of languages learnable by the human brain. Evolutionary dynamics can be formulated to describe the cultural evolution of language and the biological evolution of universal grammar.
2001
Journal of Theoretical Biology 209(1):43-59, 2001
Grammar is the computational system of language. It is a set of rules that specifies how to construct sentences out of words. Grammar is the basis of the unlimited expressibility of human language. Children acquire the grammar of their native language without formal education ...MORE ⇓
Grammar is the computational system of language. It is a set of rules that specifies how to construct sentences out of words. Grammar is the basis of the unlimited expressibility of human language. Children acquire the grammar of their native language without formal education simply by hearing a number of sample sentences. Children could not solve this learning task if they did not have some pre-formed expectations. In other words, children have to evaluate the sample sentences and choose one grammar out of a limited set of candidate grammars. The restricted search space and the mechanism which allows to evaluate the sample sentences is called universal grammar. Universal grammar cannot be learned; it must be in place when the learning process starts. In this paper, we design a mathematical theory that places the problem of language acquisition into an evolutionary context. We formulate equations for the population dynamics of communication and grammar learning. We ask how accurate children have to learn the grammar of their parents' language for a population of individuals to evolve and maintain a coherent grammatical system. It turns out that there is a maximum error tolerance for which a predominant grammar is stable. We calculate the maximum size of the search space that is compatible with coherent communication in a population. Thus, we specify the conditions for the evolution of universal grammar.
Science 291:114-118, 2001
Universal grammar specifies the mechanism of language acquisition. It determines the range of grammatical hypothesis that children entertain during language learning and the procedure they use for evaluating input sentences. How universal grammar arose is a major challenge for ...MORE ⇓
Universal grammar specifies the mechanism of language acquisition. It determines the range of grammatical hypothesis that children entertain during language learning and the procedure they use for evaluating input sentences. How universal grammar arose is a major challenge for evolutionary biology. We present a mathematical framework for the evolutionary dynamics of grammar learning. The central result is a coherence threshold, which specifies the condition for a universal grammar to induce coherent communication within a population. We study selection of grammars within the same universal grammar and competition between different universal grammars. We calculate the condition under which natural selection favors the emergence of rule-based, generative grammars that underlie complex language.
1998
Syntax: A Journal of Theoretical, Experimental, and Interdisciplinary Research 1(2):192-205, 1998
In this article we present new results of a novel computational approach to the interaction of two important cognitive-linguistic phenomena: (1) language learning; and (2) language change over time (diachronic linguistics). We exploit the insight that while language learning ...MORE ⇓
In this article we present new results of a novel computational approach to the interaction of two important cognitive-linguistic phenomena: (1) language learning; and (2) language change over time (diachronic linguistics). We exploit the insight that while language learning takes place at the individual level, language change is more properly regarded as an ensemble property that takes place at the level of populations of language learners. We show by analytical and computer simulation methods that language learning can be regarded as the driving force behind a dynamical systems account of language change. We apply this model to the specific case of historical change from Classical Portuguese to European Portuguese, demonstrating how a particular language learning model coupled with data on the differences between Classical and European Portuguese leads to specific predictions for possible language-change envelopes. The main investigative message of this paper is to show how this methodology can be applied to a specific case, that of Portuguese. The main moral underscores the individual/population difference; we show that simply because an individual will choose a particular grammar does not mean that all other grammars will be eliminated.
1997
A Dynamical Systems Model for Language ChangePDF
Complex Systems 11:161-204, 1997
This paper formalizes linguists' intuitions about language change, proposing a dynamical systems model for language change derived from a model for language acquisition. Linguists must explain not only how languages are learned but also how and why they ...
Evolutionary Consequences of Language LearningPDF
Linguistics and Philosophy 20(6):697-719, 1997
Linguists' intuitions about language change can be captured by a dynamical systems model derived from the dynamics of language acquisition. Rather than having to posit a separate model for diachronic change, as has sometimes been done by drawing on assumptions from population ...MORE ⇓
Linguists' intuitions about language change can be captured by a dynamical systems model derived from the dynamics of language acquisition. Rather than having to posit a separate model for diachronic change, as has sometimes been done by drawing on assumptions from population biology (cf. Cavalli-Sforza and Feldman, 1973; 1981; Kroch, 1990), this new model dispenses with these independent assumptions by showing how the behavior of individual language learners leads to emergent, global population characteristics of linguistic communities over several generations. As the simplest case, we formalize the example of two grammars and show that even this situation leads directly to a nonlinear (quadratic) dynamical system. We study this one parameter model in a variety of situations for different kinds of acquisition algorithms and maturational times, showing how different learning theories can have very different evolutionary consequences. This allows us to formulate an evolutionary criterion for the adequacy of grammatical and learning theories. An application of the computational model to the historical loss of Verb Second from Old French to Modern French is described showing how otherwise adequate grammatical theories might fail the evolutionary criterion.
Modeling the Dynamics of Historical Linguistics
New England Conference on Complex Systems, 1997
Populations of Learners: The Case of European PortuguesePDF
Proceedings of the Nineteenth Annual Conference of the Cognitive Science Society, 1997
1996
Cognition 61(1-2):161-193, 1996
This paper shows how to formally characterize language learning in a finite parameter space, for instance, in the principles-and-parameters approach to language, as a Markov structure. New language learning results follow directly; we can explicitly calculate how many positive ...MORE ⇓
This paper shows how to formally characterize language learning in a finite parameter space, for instance, in the principles-and-parameters approach to language, as a Markov structure. New language learning results follow directly; we can explicitly calculate how many positive examples on average (``sample complexity'') it will take for a learner to correctly identify a target language with high probability. We show how sample complexity varies with input distributions and learning regimes. In particular we find that the average time to converge under reasonable language input distributions for a simple three-parameter system first described by Gibson and Wexler (1994) is psychologically plausible, in the range of 100-150 positive examples. We further find that a simple random step algorithm - that is, simply jumping from one language hypothesis to another rather than changing one parameter at a time - works faster and always converges to the right target language, in contrast to the single-step, local parameter setting method advocated in some recent work.