[langev] Johann-Mattis List

Cross-Linguistic Data Formats, advancing data sharing and re-use in comparative linguisticsdoi.org

R Forkel, J List, SJ Greenhill, C Rzymski, S Bank, M Cysouw, H Hammarstrom, M Haspelmath, GA Kaiping, RD Gray

Scientific Data 5(180205), 2018

The amount of available digital data for the languages of the world is constantly increasing. Unfortunately, most of the digital data are provided in a large variety of formats and therefore not amenable for comparison and re-use. The Cross-Linguistic Data Formats initiative ...MORE ⇓

The amount of available digital data for the languages of the world is constantly increasing. Unfortunately, most of the digital data are provided in a large variety of formats and therefore not amenable for comparison and re-use. The Cross-Linguistic Data Formats initiative proposes new standards for two basic types of data in historical and typological language comparison (word lists, structural datasets) and a framework to incorporate more data types (e.g. parallel texts, and dictionaries). The new specification for cross-linguistic data formats comes along with a software package for validation and manipulation, a basic ontology which links to more general frameworks, and usage examples of best practices.

Search Google Scholar

Sequence comparison in computational historical linguisticsdoi.org

J List, M Walworth, SJ Greenhill, T Tresoldi, R Forkel

Journal of Language Evolution 3(2):130-144, 2018

With increasing amounts of digitally available data from all over the world, manual annotation of cognates in multi-lingual word lists becomes more and more time-consuming in historical linguistics. Using available software packages to pre-process the data prior to manual ...MORE ⇓

With increasing amounts of digitally available data from all over the world, manual annotation of cognates in multi-lingual word lists becomes more and more time-consuming in historical linguistics. Using available software packages to pre-process the data prior to manual analysis can drastically speed-up the process of cognate detection. Furthermore, it allows us to get a quick overview on data which have not yet been intensively studied by experts. LingPy is a Python library which provides a large arsenal of routines for sequence comparison in historical linguistics. With LingPy, linguists can not only automatically search for cognates in lexical data, but they can also align the automatically identified words, and output them in various forms, which aim at facilitating manual inspection. In this tutorial, we will briefly introduce the basic concepts behind the algorithms employed by LingPy and then illustrate in concrete workflows how automatic sequence comparison can be applied to multi-lingual word lists. The goal is to provide the readers with all information they need to (1) carry out cognate detection and alignment analyses in LingPy, (2) select the appropriate algorithms for the appropriate task, (3) evaluate how well automatic cognate detection algorithms perform compared to experts, and (4) export their data into various formats useful for additional analyses or data sharing. While basic knowledge of the Python language is useful for all analyses, our tutorial is structured in such a way that scholars with basic knowledge of computing can follow through all steps as well.

Search Google Scholar

Networks uncover hidden lexical borrowing in Indo-European language evolutiondoi.org PDF

S Nelson-Sathi, JM List, H Geisler, H Fangerau, RD Gray, W Martin, T Dagan

Proceedings of the Royal Society B: Biological Sciences 278(1713):1794--1803, 2011

Abstract Language evolution is traditionally described in terms of family trees with ancestral languages splitting into descendent languages. However, it has long been recognized that language evolution also entails horizontal components, most commonly through lexical ...

Cited by 32 in Semantic Scholar | Search Google Scholar

Language Evolution and Computation Bibliography