Haitao Liu
2011
Europhysics Letters 93(2):28005, 2011
In this study, the complex-network approaches are employed to investigate the word form networks and the lemma networks extracted from dependency syntactic treebanks of fifteen different languages. The results show that it is possible to classify human languages by means of the ...MORE ⇓
In this study, the complex-network approaches are employed to investigate the word form networks and the lemma networks extracted from dependency syntactic treebanks of fifteen different languages. The results show that it is possible to classify human languages by means of the main parameters of complex networks. The complex-network approaches can obtain language classifications as precise as achieved by contemporary word order typology. Clustering experiments point to the fact that the difference between the word form networks and the lemma networks can make for a better classification of languages. In short, the dependency syntactic networks can reflect morphological variation degrees and morphological complexity.
Physica A: Statistical Mechanics and its Applications 390(7):1370-1380, 2011
The study of properties of speech sound systems is of great significance in understanding the human cognitive mechanism and the working principles of speech sound systems. Some properties of speech sound systems, such as the listener-oriented feature and the talker-oriented ...MORE ⇓
The study of properties of speech sound systems is of great significance in understanding the human cognitive mechanism and the working principles of speech sound systems. Some properties of speech sound systems, such as the listener-oriented feature and the talker-oriented feature, have been unveiled with the statistical study of phonemes in human languages and the research of the interrelations between human articulatory gestures and the corresponding acoustic parameters. With all the phonemes of speech sound systems treated as a coherent whole, our research, which focuses on the dynamic properties of speech sound systems in operation, investigates some statistical parameters of Chinese phoneme networks based on real text and dictionaries. The findings are as follows: phonemic networks have high connectivity degrees and short average distances; the degrees obey normal distribution and the weighted degrees obey power law distribution; vowels enjoy higher priority than consonants in the actual operation of speech sound systems; the phonemic networks have high robustness against targeted attacks and random errors. In addition, for investigating the structural properties of a speech sound system, a statistical study of dictionaries is conducted, which shows the higher frequency of shorter words and syllables and the tendency that the longer a word is, the shorter the syllables composing it are. From these structural properties and dynamic properties one can derive the following conclusion: the static structure of a speech sound system tends to promote communication efficiency and save articulation effort while the dynamic operation of this system gives preference to reliable transmission and easy recognition. In short, a speech sound system is an effective, efficient and reliable communication system optimized in many aspects.
2010
Chinese Science Bulletin 55(30):3458-3465, 2010
To investigate the feasibility of using complex networks in the study of linguistic typology, this paper builds and explores 15 linguistic complex networks based on the dependency syntactic treebanks of 15 languages. The results show that it is possible to classify human ...MORE ⇓
To investigate the feasibility of using complex networks in the study of linguistic typology, this paper builds and explores 15 linguistic complex networks based on the dependency syntactic treebanks of 15 languages. The results show that it is possible to classify human languages by means of the following main parameters of complex networks: (a) average degree of the node, (b) cluster coefficients, (c) average path length, (d) network centralization, (e) diameter, (f) power exponent of degree distribution, and (g) the determination coefficient of power law distributions. The precision of this method is similar to the results achieved by means of modern word order typology. This paper tries to solve two problems of current linguistic typology. First, the language sample of a typological study is not real text; second, typological studies pay too much attention to local language structures in the course of choosing typological parameters. This study performs better in global typological features of language and not only enhances typological methods, but it is also valuable for developing the applications of complex networks in the humanities, social, and life sciences.
2009
Chinese Science Bulletin 54(16):2781-2785, 2009
Almost all language networks in word and syntactic levels are small-world and scale-free. This raises the questions of whether a language network in deeper semantic or cognitive level also has the similar properties. To answer the question, we built up a Chinese semantic network ...MORE ⇓
Almost all language networks in word and syntactic levels are small-world and scale-free. This raises the questions of whether a language network in deeper semantic or cognitive level also has the similar properties. To answer the question, we built up a Chinese semantic network based on a treebank with semantic role (argument structure) annotation and investigated its global statistical properties. The results show that although semantic network is also small-world and scale-free, it is different from syntactic network in hierarchical structure and K-Nearest-Neighbor correlation.
Using a Chinese treebank to measure dependency distancePDF
Corpus Linguistics and Linguistic Theory 5(2):161-174, 2009
This article describes a method for calculating the adependency distancea between the words in a text a i.e. the number of words that separate each word from the word on which it depends syntactically a and reports the results of applying this method to a Chinese treebank. This ...MORE ⇓
This article describes a method for calculating the adependency distancea between the words in a text a i.e. the number of words that separate each word from the word on which it depends syntactically a and reports the results of applying this method to a Chinese treebank. This study shows that Chinese dependencies tend strongly to be governor-final and that the mean dependency distance of words is much higher for Chinese than for other languages that have been studied including English, German and Japanese. It is unclear whether this difference means that Chinese is syntactically more difficult to process.
Journal of Quantitative Linguistics 16(3):256-273, 2009
This article investigates probability distributions of the dependency relation extracted from a Chinese dependency treebank. The author shows the frequency distributions of dependency type, of word class both as a dependent and a governor, of verb as a governor, and of noun as a ...MORE ⇓
This article investigates probability distributions of the dependency relation extracted from a Chinese dependency treebank. The author shows the frequency distributions of dependency type, of word class both as a dependent and a governor, of verb as a governor, and of noun as a dependent. The fitting results reveal that most of the investigated distributions are excellently fitted with a modified right-truncated Zipf-Alekseev distribution. In the analysis of exponential regressions, most of the determination coefficients R2 are very good, which is an alternative evidence that the investigated distributions are fitted well.
Dependency direction as a means of word-order typology a method based on dependency treebanksdoi.orgPDF
Lingua, 2009
Word-order typology often uses the linear order of binary grammatical pairs in sentences to classify a language. The present paper proposes a method based on dependency treebanks as a typological means. This paper investigates 20 languages using treebanks with different sizes ...MORE ⇓
Word-order typology often uses the linear order of binary grammatical pairs in sentences to classify a language. The present paper proposes a method based on dependency treebanks as a typological means. This paper investigates 20 languages using treebanks with different sizes from 16 K to 1 million dependencies. The results show that some languages are more head-initial or head-final than others, but all contain head-initial and head-final elements. The 20 languages can be arranged on a continuum with complete head-initial and head-final patterns as the two ends. Some data about subjectaverb, objectaverb and adjectiveanoun are extracted from the treebanks for comparison with the typological studies based on the traditional means, the results are similar. The investigation demonstrates that the proposed method is valid for positioning a language in the typological continuum and the resources from computational linguistics can also be used in language typology.
2008
Physica A: Statistical Mechanics and its Applications 387(12):3048-3058, 2008
This paper proposes how to build a syntactic network based on syntactic theory and presents some statistical properties of Chinese syntactic dependency networks based on two Chinese treebanks with different genres. The results show that the two syntactic networks are small-world ...MORE ⇓
This paper proposes how to build a syntactic network based on syntactic theory and presents some statistical properties of Chinese syntactic dependency networks based on two Chinese treebanks with different genres. The results show that the two syntactic networks are small-world networks, and their degree distributions obey a power law. The finding, that the two syntactic networks have the same diameter and different average degrees, path lengths, clustering coefficients and power exponents, can be seen as an indicator that complexity theory can work as a means of stylistic study. The paper links the degree of a vertex with a valency of a word, the small world with the minimized average distance of a language, that reinforces the explanations of the findings from linguistics.
Dependency distance as a metric of language comprehension difficulty
Journal of Cognitive Science 9(2):159-191, 2008
Linguistic complexity is a measure of the cognitive difficulty of human language processing. The present paper proposes dependency distance, in the framework of dependency grammar, as an insightful metric of complexity. Three hypotheses are formulated: (1) The human language ...MORE ⇓
Linguistic complexity is a measure of the cognitive difficulty of human language processing. The present paper proposes dependency distance, in the framework of dependency grammar, as an insightful metric of complexity. Three hypotheses are formulated: (1) The human language parser prefers linear orders that minimize the average dependency distance of the recognized sentence (2) There is a threshold that the average dependency distance of most sentences or texts of human languages does not exceed (3) Grammar and cognition combine to keep dependency distance within the threshold. Twenty corpora from different languages with dependency syntactic annotation are used to test these hypotheses. The paper reports the average dependency distance in these corpora and analyzes the factors which influence dependency distance. The findings - that average dependency distance has a tendency to be minimized in human language and that there is a threshold of less than 3 words in average dependency distance and grammar plays an important role in constraining distance -support all three hypotheses, although some questions are still open for further research.
Europhysics Letters 83(18002), 2008
That almost all language networks are small-world and scale-free raises the question of whether syntax plays a role to measure the complexity of a language network. To answer this question, we built up two random language (dependency) networks based on a dependency syntactic ...MORE ⇓
That almost all language networks are small-world and scale-free raises the question of whether syntax plays a role to measure the complexity of a language network. To answer this question, we built up two random language (dependency) networks based on a dependency syntactic network and investigated the complexity of these three language networks to see if the non-syntactic ones have network indicators similar to the syntactic one. The results show that all the three networks are small-world and scale-free. While syntax influences the indicators of a complex network, scale-free is only a necessary but not sufficient condition to judge whether a network is syntactic or non-syntactic. The network analysis focuses on the global organization of a language, it may not reflect the subtle syntactic differences of the sentence structure.
2007
Probability distribution of dependency distancePDF
Glottometrics 15:1-12, 2007
This paper investigates probability distributions of dependency distances in six texts extracted from a Chinese dependency treebank. The fitting results reveal that the investigated distribution can be well captured by the right truncated Zeta distribution. In order to restrict ...MORE ⇓
This paper investigates probability distributions of dependency distances in six texts extracted from a Chinese dependency treebank. The fitting results reveal that the investigated distribution can be well captured by the right truncated Zeta distribution. In order to restrict the model only to natural language, two samples with randomly generated governors are investigated. One of them can be described e.g. by the Hyperpoisson distribution, the other satisfies the Zeta distribution. The paper also presents a study on sequential plot and mean dependency distance of six texts with three analyses (syntactic, and two random). Of these three analyses, syntactic analysis has a minimum (mean) dependency distance.