350 rub
Journal Dynamics of Complex Systems - XXI century №4 for 2016 г.
Article in number:
Classification of texts language identification programs
Keywords:
identification principle
identification method
text language identification
language identification program
detector of text language affiliation
Authors:
S.N. Kalegin - Post-graduate Student, Head of Sector of STD, JSC «MNITI» (Moscow)
E-mail: ksn@mniti.ru
Abstract:
With increasing amounts of unstructured multilingual data in the modern technogenic world and the evolution of communications media, such as television, the Internet, etc., increases the need for the development of the language identification systems. The article presents classification of texts language identification programs on the basis of their identification principles and a comparative test, which revealed the relevant features. This allow developers to more sensible approach to the design of this software, and consumers more consciously choose it. In addition, the grading of language identifiers can be used in teaching and thematic books for students of technical faculties, specialists in the field of the artificial intelligence, etc., as well as in the development of new language identification systems.
Pages: 27-33
References
- Beesley K.R. Language identifier: A Computer Program for Automatic Natural-Language Identification of On-line Text // 29th Annual Conference of the American Translators Association. Seattle. Washington. USA. 1988.
- Sibun P., Reynar J.C. Language Identification: Examining the Issues // 5th Symposium on Document Analysis and Information Retrieval (SDAIR-96). Las Vegas. Nevada. USA. 1996.
- Kalegin S.N. Avtomaticheskoe opredelenie jazyka teksta // Filologos. № 4 (27). Elec: EGU im. I.A. Bunina. 2015. S. 21−28.
- Lukashevich N.V. Modeli i metody avtomaticheskojj obrabotki nestrukturirovannojj informacii na osnove bazy znanijj ontologicheskogo tipa. Dis. - dokt. tekhn. nauk. 05.25.05 / Moskva. 2014.
- Amine A., Elberrichi Z., Simonet M. Automatic language identification: an alternative unsupervised approach using a new hybrid algorithm // International Journal of Computer Science and Applications, Technomathematics Research Foundation. 2010. V. 7. № 1.
- Almeida-Cruz Y., Estévez-Velarde S. y Piad-Morffis A. Detección de Idioma en Twitter // Revista Internacional de Gestión del Conocimiento y la Tecnología. V. 2 (3). 2014.
- Kalegin S.N. EHksperimentalnoe issledovanie vozmozhnosti avtomatizacii processa jazykovojj identifikacii teksta // Konferencija «CNews FORUM 2016: Informacionnye tekhnologii zavtra». Moskva. 2016.
- Kalegin S.N. Ocenka ehffektivnosti metodov opredelenija jazykovojj prinadlezhnosti nestrukturirovannogo teksta i varianty ikh programmnojj realizacii // Mezhdunar. konf. «CONCORT-2016». Nizhnijj Novgorod. 2016.
- Kalegin S.N. Sposoby opredelenija jazyka teksta // Filologicheskie nauki. Voprosy teorii i praktiki. № 12 (54): V 4-kh chastjakh. CH. II. Tambov: Gramota. 2015. S. 84−89.
- Avtomaticheskijj opredelitel jazyka teksta «Guesser.ru». URL = http://guesser.ru/ (15.10.2016).
- Lozovjuk A. PHPLangautodetect. URL = http://code.google.com/p/phplangautodetect/ (15.10.2016).
- Kalegin S.N. Programma «Modul opredelenija jazyka teksta» («MOJAT»). Svidetelstvo o gosudarstvennojj registracii programmy dlja EHVM № 2015663644 ot 28.12.2015.
- Automatic language identifier. URL = http://labs.translated.net/ (15.10.2016).
- Avtomaticheskijj opredelitel jazyka teksta Poliglot 3000 (P3000). URL = http://www.polyglot3000.com/ (15.10.2016).
- Language Identifier by Henrik Falck. URL = http://whatlanguageisthis.com/ (15.10.2016).
- SILC RALI. URL = http://rali.iro.umontreal.ca/rali/ (15.10.2016).
- MS BingTranslator. URL = http://www.microsofttranslator.com/ (15.10.2016).
- Open Xerox Language Identifier. URL = http://open.xerox.com/Services/LanguageIdentifier/ (15.10.2016).
- EUreka ENgine. URL = http://eurekaengine.ru/, http://palitrumlab.ru/products (15.10.2016).
- Kalegin S.N. Vazhnost vybora osnovnogo identifikacionnogo principa pri proektirovanii jazykovykh opredelitelejj // Mezhdunar. konf. «Konvergentnye kognitivno-informacionnye tekhnologii». Moskva. 2016.
- Kalegin S.N. Sposoby opredelenija jazykovojj prinadlezhnosti nestrukturirovannogo teksta v multijazychnojj informacionnojj srede // Mezhdunar. konf. «CONCORT-2016». Nizhnijj Novgorod. 2016.