Language Evolution and Computation Bibliography

Our site (www.isrl.uiuc.edu/amag/langev) retired, please use https://langev.com instead.
Journal :: Journal of the American Society for Information Science and Technology
2009
Journal of the American Society for Information Science and Technology 60(4):837-843, 2009
It has been argued that the actual distribution of word frequencies could be reproduced or explained by generating a random sequence of letters and spaces according to the so-called intermittent silence process. The same kind of process could reproduce or explain the counts of ...MORE ⇓
It has been argued that the actual distribution of word frequencies could be reproduced or explained by generating a random sequence of letters and spaces according to the so-called intermittent silence process. The same kind of process could reproduce or explain the counts of other kinds of units from a wide range of disciplines. Taking the linguistic metaphor, we focus on the frequency spectrum, i.e., the number of words with a certain frequency, and the vocabulary size, i.e., the number of different words of text generated by an intermittent silence process. We derive and explain how to calculate accurately and efficiently the expected frequency spectrum and the expected vocabulary size as a function of the text size.
2006
Journal of the American Society for Information Science and Technology 57(10):1326-1337, 2006
Word usage is of interest to linguists for its own sake as well as to social scientists and others seeking to track the spread of ideas, for example in public debates over political decisions. The historical evolution of language can be analysed with the tools of corpus ...MORE ⇓
Word usage is of interest to linguists for its own sake as well as to social scientists and others seeking to track the spread of ideas, for example in public debates over political decisions. The historical evolution of language can be analysed with the tools of corpus linguistics through evolving corpora and the web. But word usage statistics can only be gathered for known words. In this article, techniques are described and tested for identifying new words from the web, focussing on the case when the words are related to a topic and have a hybrid form with a common sequence of letters. The results highlight the need to employ a combination of search techniques and show the wide potential of hybrid word family investigations in linguistics and social science.