Marco Turchi
2006
A statistical analysis of language evolutionPDF
Proceedings of the 6th International Conference on the Evolution of Language, pages 348-355, 2006
We propose to address a series of questions related to the evolution of languages by statistical analysis of written text. We develop a ''statistical signature'' of a language, analogous to the genetic signature proposed by Karlin in biology, and we show its stability within ...MORE ⇓
We propose to address a series of questions related to the evolution of languages by statistical analysis of written text. We develop a ''statistical signature'' of a language, analogous to the genetic signature proposed by Karlin in biology, and we show its stability within languages and its discriminative power between languages. Using this representation, we address the question of its trajectory during language evolution. We first reconstruct a phylogenetic tree of IE languages using this property, in this way showing that it also contains enough information to act as a ''tracking'' tag for a language during its evolution. One advantage of this kind of phylogenetic trees is that they do not depend on any semantic assessment or on any choice of words. We use the ''statistical signature'' to analyze a time-series of documents from four romance languages, following their transition from latin. The languages are italian, french, spanish and portuguese, and the time points correspond to all centuries from III bC to XX AD.