350 rub
Journal Highly available systems №3 for 2014 г.
Article in number:
Automatic construction of a formalized representation of semantic contents of unstructured texts of mass-media and social networks
Keywords:
automated text processing
semantic analysis
formal description of text
semantic structure
data extraction
linguistic software
declarative means
Authors:
A.A. Khoroshilov - Dr.Sc. (Eng.), Leading Research Scientist, Institute of Informatics Problems, Russian Academy of Sciences, Professor of the Moscow Aviation Institute (National Research University). E-mail: khoroshilov@mail.ru
Yu. V. Nikitin - Applicant, Institute of Informatics Problems, Russian Academy of Sciences. E-mail: yuri.v.nikitin@gmail.com
Alexei A. Khoroshilov - Ph.D. (Eng.), Research Scientist, Institute of Informatics Problems, Russian Academy of Sciences. E-mail: alex_khoroshilov@mail.ru
V. I. Budzko - Dr.Sc. (Eng.), Professor, Deputy Director, Institute of Information Problems, Russian Academy of Sciences. E-mail: vbudzko@ipiran.ru
Yu. V. Nikitin - Applicant, Institute of Informatics Problems, Russian Academy of Sciences. E-mail: yuri.v.nikitin@gmail.com
Alexei A. Khoroshilov - Ph.D. (Eng.), Research Scientist, Institute of Informatics Problems, Russian Academy of Sciences. E-mail: alex_khoroshilov@mail.ru
V. I. Budzko - Dr.Sc. (Eng.), Professor, Deputy Director, Institute of Information Problems, Russian Academy of Sciences. E-mail: vbudzko@ipiran.ru
Abstract:
The solution of the problem of social and political monitoring requires sophisticated tool for semantic analysis of unstructured text information.
The authors proposed methods for automatic processing of text messages and social networking media, allowing to obtain quantitative indicators to measure public attitudes towards the activities of the authorities.
Linguistic processing of text messages should include the automatic creation of a formalized semantic structure of texts and clustering on similar semantic text contents (grouping of texts according to their newsbreaks).
Representation of semantic structure of the text in a formalized form must contain an automatic abstract of the document, a list of key phrases, a list of selected objects and predicates, a list of communication between objects defined in the text, as well as bibliographic details.
Linguistic software (lingware) used to solve problem of this type should enforce basic linguistic procedures such as the graphematic text analysis, the morphological analysis of words, the semantic and syntactic analysis of the texts, the conceptual analysis of texts and the distribution-statistical analysis of texts.
Declarative means of lingware should include conceptual lexicons that have to be created on the basis of automated processing and analysis of lexical and semantic-syntactic analysis of representative corpus.
The authors conducted a simulation of the automated processing of texts on the example of informational messages on online media and users of social network "VKontakte".
In their experiments authors used Metafraz lingware, based on the theoretical concept of idiomatic conceptual analysis of texts.
Pages: 52-69
References
- Starovoytov A.V., Poshataev O.N., Prokhorov S.N., Khoroshilov A.A. Metody avtomatizirovannogo sostavleniya i vedeniya slovarey // Sb. «Informatizatsiya i svyaz'». Tsentr informatsionnykh tekhnologiy i sistem organov ispolnitel'noy vlasti. 2013. № 3. S. 91-97.
- Bogdanov Yu.M., Poshataev O.N., Khoroshilov A.A. Printsipy sozdaniya vysokoproizvoditel'nykh sistem obrabotki i analiza tekstovoy informatsii // Sb. «Informatizatsiya i svyaz'». Tsentr informatsionnykh tekhnologiy i sistem organov ispolnitel'noy vlasti. 2013. № 3. S. 74-81.
- Poshataev O.N., Khoroshilov A.A. Metody analiza tekstov v tekhnologiyakh «Big Data» // Sb. «Elektronnye biblioteki: perspektivnye metody i tekhnologii, elektronnye kollektsii». XV Vserossiyskaya nauchnaya konferentsiya RCDL. Yaroslavl', Rossiya. 14-17 oktyabrya. 2013. S. 30-38.
- Belonogov G.G., Gilyarevskiy R.S., Seledkov S.N., Khoroshilov A.A. O putyakh povysheniya kachestva poiska tekstovoy informatsii v sisteme Internet // Nauchno-tekhnicheskaya informatsiya. Ser. 2. Informatsionnye protsessy i sistemy. Vserossiyskiy institut nauchnoy i tekhnicheskoy informatsii RAN. 2012. № 8. S. 15-22.
- Belonogov G.G., Gilyarevskiy R.S., Khoroshilov A.A. Problemy avtomaticheskoy smyslovoy obrabotki tekstovoy informatsii // Nauchno-tekhnicheskaya informatsiya. Ser. 2. Informatsionnye protsessy i sistemy. Vserossiyskiy institut nauchnoy i tekhnicheskoy informatsii RAN. 2012. № 11. S. 24-28.
- Belonogov G.G., Gilyarevskiy R.S., Khoroshilov A.A., Khoroshilov-ml. A.A. Avtomaticheskoe raspoznavanie smyslovoy blizosti dokumentov // Nauchno-tekhnicheskaya informatsiya. Ser. 2. Informatsionnye protsessy i sistemy. Vserossiyskiy institut nauchnoy i tekhnicheskoy informatsii RAN. 2011. № 7. S. 15-22.
- Belonogov G.G., Gilyarevskiy R.S., Khoroshilov Al-dr A., Khoroshilov Al-ey A. Razvitie sistem avtomaticheskoy obrabotki tekstovoy informatsii // Neyrokomp'yutery: razrabotka, primenenie. 2010. № 8. S. 4-13.
- Belonogov G.G., Khoroshilov Al-dr A., Khoroshilov Al-ey A. Yedinitsy yazyka i rechi v sistemakh avtomaticheskoy obrabotki tekstovoy // Nauchno-tekhnicheskaya informatsiya. Ser. 2. Informatsionnye protsessy i sistemy. Vserossiyskiy institut nauchnoy i tekhnicheskoy informatsii RAN. 2005. № 11. S. 21-29.
- Belonogov G.G., Kalinin Yu.P., Khoroshilov A.A. Komp'yuternaya lingvistika i perspektivnye informatsionnye tekhnologii. Teoriya i praktika postroeniya sistem avtomaticheskoy obrabotki tekstovoy informatsii. M.: Informatsionno-izdatel'skoe agentstvo «Russkiy mir». 2004. 247 s.