350 rub
Journal Highly available systems №3 for 2014 г.
Article in number:
Automatic construction of a formalized representation of semantic contents of unstructured texts of mass-media and social networks
Authors:
A.A. Khoroshilov - Dr.Sc. (Eng.), Leading Research Scientist, Institute of Informatics Problems, Russian Academy of Sciences, Professor of the Moscow Aviation Institute (National Research University). E-mail: khoroshilov@mail.ru
Yu. V. Nikitin - Applicant, Institute of Informatics Problems, Russian Academy of Sciences. E-mail: yuri.v.nikitin@gmail.com
Alexei A. Khoroshilov - Ph.D. (Eng.), Research Scientist, Institute of Informatics Problems, Russian Academy of Sciences. E-mail: alex_khoroshilov@mail.ru
V. I. Budzko - Dr.Sc. (Eng.), Professor, Deputy Director, Institute of Information Problems, Russian Academy of Sciences. E-mail: vbudzko@ipiran.ru
Abstract:
The solution of the problem of social and political monitoring requires sophisticated tool for semantic analysis of unstructured text information. The authors proposed methods for automatic processing of text messages and social networking media, allowing to obtain quantitative indicators to measure public attitudes towards the activities of the authorities. Linguistic processing of text messages should include the automatic creation of a formalized semantic structure of texts and clustering on similar semantic text contents (grouping of texts according to their newsbreaks). Representation of semantic structure of the text in a formalized form must contain an automatic abstract of the document, a list of key phrases, a list of selected objects and predicates, a list of communication between objects defined in the text, as well as bibliographic details. Linguistic software (lingware) used to solve problem of this type should enforce basic linguistic procedures such as the graphematic text analysis, the morphological analysis of words, the semantic and syntactic analysis of the texts, the conceptual analysis of texts and the distribution-statistical analysis of texts. Declarative means of lingware should include conceptual lexicons that have to be created on the basis of automated processing and analysis of lexical and semantic-syntactic analysis of representative corpus. The authors conducted a simulation of the automated processing of texts on the example of informational messages on online media and users of social network "VKontakte". In their experiments authors used Metafraz lingware, based on the theoretical concept of idiomatic conceptual analysis of texts.
Pages: 52-69
References

  1. Starovoytov A.V., Poshataev O.N., Prokhorov S.N., Khoroshilov A.A. Metody avtomatizirovannogo sostavleniya i ve­deniya slovarey // Sb. «Informatizatsiya i svyaz'». Tsentr informatsionnykh tekhnologiy i sistem organov ispolni­tel'noy vlasti. 2013. № 3. S. 91-97.
  2. Bogdanov Yu.M., Poshataev O.N., Khoroshilov A.A. Printsipy sozdaniya vysokoproizvoditel'nykh sistem obrabotki i analiza tekstovoy informatsii // Sb. «Informatizatsiya i svyaz'». Tsentr informatsionnykh tekhnologiy i sistem orga­nov ispolnitel'noy vlasti. 2013. № 3. S. 74-81.
  3. Poshataev O.N., Khoroshilov A.A. Metody analiza tekstov v tekhnologiyakh «Big Data» // Sb. «Elektronnye biblioteki: perspektivnye metody i tekhnologii, elektronnye kollektsii». XV Vserossiyskaya nauchnaya konferentsiya RCDL. Yaroslavl', Rossiya. 14-17 oktyabrya. 2013. S. 30-38.
  4. Belonogov G.G., Gilyarevskiy R.S., Seledkov S.N., Khoroshilov A.A. O putyakh povysheniya kachestva poiska tekstovoy in­formatsii v sisteme Internet // Nauchno-tekhnicheskaya informatsiya. Ser. 2. Informatsionnye protsessy i sistemy. Vserossiyskiy institut nauchnoy i tekhnicheskoy informatsii RAN. 2012. № 8. S. 15-22.
  5. Belonogov G.G., Gilyarevskiy R.S., Khoroshilov A.A. Problemy avtomaticheskoy smyslovoy obrabotki tekstovoy infor­matsii // Nauchno-tekhnicheskaya informatsiya. Ser. 2. Informatsionnye protsessy i sistemy. Vserossiyskiy institut nauchnoy i tekhnicheskoy informatsii RAN. 2012. № 11. S. 24-28.
  6. Belonogov G.G., Gilyarevskiy R.S., Khoroshilov A.A., Khoroshilov-ml. A.A. Avtomaticheskoe raspoznavanie smyslovoy bli­zosti dokumentov // Nauchno-tekhnicheskaya informatsiya. Ser. 2. Informatsionnye protsessy i sistemy. Vserossiyskiy institut nauchnoy i tekhnicheskoy informatsii RAN. 2011. № 7. S. 15-22.
  7. Belonogov G.G., Gilyarevskiy R.S., Khoroshilov Al-dr A., Khoroshilov Al-ey A. Razvitie sistem avtomaticheskoy obrabot­ki tekstovoy informatsii // Neyrokomp'yutery: razrabotka, primenenie. 2010. № 8. S. 4-13.
  8. Belonogov G.G., Khoroshilov Al-dr A., Khoroshilov Al-ey A. Yedinitsy yazyka i rechi v sistemakh avtomaticheskoy obrabot­ki tekstovoy // Nauchno-tekhnicheskaya informatsiya. Ser. 2. Informatsionnye protsessy i sistemy. Vserossiyskiy institut nauchnoy i tekhnicheskoy informatsii RAN. 2005. № 11. S. 21-29.
  9. Belonogov G.G., Kalinin Yu.P., Khoroshilov A.A. Komp'yuternaya lingvistika i perspektivnye informatsionnye tekh­nologii. Teoriya i praktika postroeniya sistem avtomaticheskoy obrabotki tekstovoy informatsii. M.: Informatsi­onno-izdatel'skoe agentstvo «Russkiy mir». 2004. 247 s.