Language Evolution and Computation Bibliography

Our site (www.isrl.uiuc.edu/amag/langev) retired, please use https://langev.com instead.
Eric Smith
2018
Journal of Language Evolution 3(2):94-129, 2018
The increasing availability of large digital corpora of cross-linguistic data is revolutionizing many branches of linguistics. Overall, it has triggered a shift of attention from detailed questions about individual features to more global patterns amenable to rigorous, but ...MORE ⇓
The increasing availability of large digital corpora of cross-linguistic data is revolutionizing many branches of linguistics. Overall, it has triggered a shift of attention from detailed questions about individual features to more global patterns amenable to rigorous, but statistical, analyses. This engenders an approach based on successive approximations where models with simplified assumptions result in frameworks that can then be systematically refined, always keeping explicit the methodological commitments and the assumed prior knowledge. Therefore, they can resolve disputes between competing frameworks quantitatively by separating the support provided by the data from the underlying assumptions. These methods, though, often appear as a ‘black box’ to traditional practitioners. In fact, the switch to a statistical view complicates comparison of the results from these newer methods with traditional understanding, sometimes leading to misinterpretation and overly broad claims. We describe here this evolving methodological shift, attributed to the advent of big, but often incomplete and poorly curated data, emphasizing the underlying similarity of the newer quantitative to the traditional comparative methods and discussing when and to what extent the former have advantages over the latter. In this review, we cover briefly both randomization tests for detecting patterns in a largely model-independent fashion and phylolinguistic methods for a more model-based analysis of these patterns. We foresee a fruitful division of labor between the ability to computationally process large volumes of data and the trained linguistic insight identifying worthy prior commitments and interesting hypotheses in need of comparison.
2016
PNAS 113(7):1766-71, 2016
How universal is human conceptual structure? The way concepts are organized in the human brain may reflect distinct features of cultural, historical, and environmental background in addition to properties universal to human cognition. Semantics, or meaning expressed through ...MORE ⇓
How universal is human conceptual structure? The way concepts are organized in the human brain may reflect distinct features of cultural, historical, and environmental background in addition to properties universal to human cognition. Semantics, or meaning expressed through language, provides indirect access to the underlying conceptual structure, but meaning is notoriously difficult to measure, let alone parameterize. Here, we provide an empirical measure of semantic proximity between concepts using cross-linguistic dictionaries to translate words to and from languages carefully selected to be representative of worldwide diversity. These translations reveal cases where a particular language uses a single "polysemous" word to express multiple concepts that another language represents using distinct words. We use the frequency of such polysemies linking two concepts as a measure of their semantic proximity and represent the pattern of these linkages by a weighted network. This network is highly structured: Certain concepts are far more prone to polysemy than others, and naturally interpretable clusters of closely related concepts emerge. Statistical analysis of the polysemies observed in a subset of the basic vocabulary shows that these structural properties are consistent across different language groups, and largely independent of geography, environment, and the presence or absence of a literary tradition. The methods developed here can be applied to any semantic domain to reveal the extent to which its conceptual structure is, similarly, a universal attribute of human cognition and language use.
2015
Current Biology 25:1-9, 2015
BACKGROUND Concerted evolution is normally used to describe parallel changes at different sites in a genome, but it is also observed in languages where a specific phoneme changes to the same other phoneme in many words in the lexicon—a phenomenon known as regular sound change. We develop a ...MORE ⇓
BACKGROUND Concerted evolution is normally used to describe parallel changes at different sites in a genome, but it is also observed in languages where a specific phoneme changes to the same other phoneme in many words in the lexicon—a phenomenon known as regular sound change. We develop a general statistical model that can detect concerted changes in aligned sequence data and apply it to study regular sound changes in the Turkic language family. RESULTS Linguistic evolution, unlike the genetic substitutional process, is dominated by events of concerted evolutionary change. Our model identified more than 70 historical events of regular sound change that occurred throughout the evolution of the Turkic language family, while simultaneously inferring a dated phylogenetic tree. Including regular sound changes yielded an approximately 4-fold improvement in the characterization of linguistic change over a simpler model of sporadic change, improved phylogenetic inference, and returned more reliable and plausible dates for events on the phylogenies. The historical timings of the concerted changes closely follow a Poisson process model, and the sound transition networks derived from our model mirror linguistic expectations. CONCLUSIONS We demonstrate that a model with no prior knowledge of complex concerted or regular changes can nevertheless infer the historical timings and genealogical placements of events of concerted change from the signals left in contemporary data. Our model can be applied wherever discrete elements—such as genes, words, cultural trends, technologies, or morphological traits—can change in parallel within an organism or other evolving group.
2010
Communication and collective action: language and the evolution of human cooperationPDF
Evolution and Human Behavior 31(4):231--245, 2010
All social species face various “collective action problems”(CAPs) or “social dilemmas,” meaning problems in achieving cooperating when the best move from a selfish point of view yields an inferior collective outcome. Compared to most other species, humans are very ...
2007
Phonological Reconstruction of a Dead Language Using the Gradual Learning AlgorithmPDF
Proceedings of Ninth Meeting of the ACL Special Interest Group in Computational Morphology and Phonology, 2007
This paper discusses the reconstruction of the Elamite language s phonology from its orthography using the Gradual Learning Algorithm, which was re-purposed to learn underlying phonological forms from surface orthography. Practical issues are raised regarding the difficulty of ...MORE ⇓
This paper discusses the reconstruction of the Elamite language s phonology from its orthography using the Gradual Learning Algorithm, which was re-purposed to learn underlying phonological forms from surface orthography. Practical issues are raised regarding the difficulty of mapping between orthography and phonology, and Optimality Theory s neglected Lexicon Optimization module is highlighted.