N. Cristianini
2012
Journal of Quantitative Linguistics 19(2):95--120, 2012
Abstract We apply to the task of linguistic phylogenetic inference a successful cognate identification learning model based on point accepted mutation (PAM)-like matrices. We train our system and we employ the learned parameters for measuring the lexical distance ...
2006
A statistical analysis of language evolutionPDF
Proceedings of the 6th International Conference on the Evolution of Language, pages 348-355, 2006
We propose to address a series of questions related to the evolution of languages by statistical analysis of written text. We develop a ''statistical signature'' of a language, analogous to the genetic signature proposed by Karlin in biology, and we show its stability within ...MORE ⇓
We propose to address a series of questions related to the evolution of languages by statistical analysis of written text. We develop a ''statistical signature'' of a language, analogous to the genetic signature proposed by Karlin in biology, and we show its stability within languages and its discriminative power between languages. Using this representation, we address the question of its trajectory during language evolution. We first reconstruct a phylogenetic tree of IE languages using this property, in this way showing that it also contains enough information to act as a ''tracking'' tag for a language during its evolution. One advantage of this kind of phylogenetic trees is that they do not depend on any semantic assessment or on any choice of words. We use the ''statistical signature'' to analyze a time-series of documents from four romance languages, following their transition from latin. The languages are italian, french, spanish and portuguese, and the time points correspond to all centuries from III bC to XX AD.