Comparing different quantitative metrics of language proximity: north-caucasian languages


V.D. Solovyev, R.B. Akhtjamov

The paper compares the North-Caucasian languages by different proximity metrics, based on grammatical, lexical and genetic databases. Statistical data analysis on North-Caucasian language detects a high level of correlation between genetics and geography, and also a significant level of correlation between lexis and all other parameters. Grammar is the least correlating to other parameters. All the data described in the articles is in good accordance to the genealogic classification on all levels of hierarchy: languages, branches, families; this allows to use them as predictors of language similarity. Local analysis (of main subgroups of North-Caucasian languages) revealed the following patterns: 1. In all the examples the lexical distances correlated to genealogical proximity, and if such was not established to geographical proximity. Somewhat less full correlation with the established affinity is shown by grammatical distance. Even less accurate affinity is shown by genetical distance. 2. An important factor is not only the geographical distance, but the existence of a common border. 3. As a rule, if a few of described metrics deviate from the genealogical proximity, they correlate with each other, which can signify the existence of common factors affecting the development of languages and peoples. Most often those deviations can be explained by geographical proximity, in some cases additional research is needed. 4. In situations when language affinity is not established the considered metrics can serve to produce hypotheses on language affinity. The comparison of diverse quantitative data provides new approaches to determining migration paths.

