350 rub
Journal Highly available systems №3 for 2014 г.
Article in number:
Semantic structuring of textual knowledge for the system of analytical monitoring the big data in social sphere
Authors:
Yu. I. Morozova - Research Scientist, Institute of Informatics Problems, Russian Academy of Sciences. E-mail: yulia-ipi@yandex.ru
E. B. Kozerenko - Ph. D. (Phil.), Head of Laboratory, Institute of Informatics Problems, Russian Academy of Sciences. E-mail: kozerenko@mail.ru
V. I. Budzko - Dr.Sc. (Eng.), Deputy Director for Sciences, Institute of Informatics Problems, Russian Academy of Sciences. E-mail: vbudzko@ipiran.ru
K. I. Kuznetsov - Research Scientist, Institute of Informatics Problems, Russian Academy of Sciences. E-mail: k.smith@mail.ru
М. М. Charnine - Ph.D. (Eng.), Senior Research Scientist, Institute of Informatics Problems, Russian Academy of Sciences. E-mail: mc@keywen.com
Abstract:
Modern technical possibilities of computer technologies allow to accumulate and process extremely large volumes of information (Big Data). This article provides an overview of the systems for monitoring the Internet working with the Russian language, and it describes the experiments aiming at the development of a system of analytical monitoring for the domain "socio-political life of the regions of the Russian Federation (on the example of one of the Federal autonomous regions)." Technological component of the system is the semantically-oriented linguistic processor that performs structuring and retrieving knowledge from texts in natural language. The article presents the semantic methods and tools for structuring and retrieval of relevant information from texts in the Russian language from the media and the Internet, including blogs, tweets and social networks. We study the properties of natural language as the generic modeling tool. Basic properties of the language as a tool for semantic structuring are the following: hierarchy, associativity, polysemy and synonymity. Bringing information from an unstructured to a clear structured model for some database is both a fundamental task, related to the problem of modeling the interaction between language and thinking, and an applied one of the database arrangement for a specified subject area. The aim of the research and development is the creation of analytical monitoring system in the social field.
Pages: 21-34
References

  1. Budzko V.I. Razvitie sistem vysokoy dostupnosti s primeneniem tekhnologii «bol'shie dannye» // Sistemy vysokoy dostupnosti. 2013. T. 9. № 4. S. 3-11.
  2. Kozerenko Ye.B., Kuznetsov I.P. Kognitivno-lingvisticheskie predstavleniya v sistemakh obrabotki tekstov // Informatika i ee primeneniya. 2010. T. 4. Vyp. 3. S. 69-76.
  3. Kuznetsov I.P., Kozerenko E.B., Matskevich A.G. Intelligent extraction of knowledge structures from natural language texts // Proceedings - 2011 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Workshops. WI-IAT 2011. P. 269-272.
  4. Kuznetsov I.P., Kozerenko E.B. Semantic approach to explicit and implicit knowledge extraction // Proceedings of ICAI-11, WORLDCOMP-11. July 18-21, 2011. Las Vegas, Nevada, USA. CRSEA Press. USA. 2011. P. 882-887.
  5. Morozova Yu.I. Postroenie semanticheskikh vektornykh prostranstv razlichnykh predmetnykh oblastey // Informatika i ee primeneniya. 2013. T. 7. Vyp. 1. S. 90-93.
  6. Spisok sistem monitoringa sotsial'nykh media na angliyskom yazyke. URL: http://wiki.kenburbary.com/social-meda-monitoring-wiki (data obrashcheniya: 2014-06-28).
  7. Onlayn-sistema monitoringa sotsial'nykh media «Khodyat slukhi». URL: http://hodyat-sluhi.ru/ (data obrashcheniya: 2014-07-02).
  8. Onlayn-sistema monitoringa sotsial'nykh media «IQBuzz». URL: http://www.iqbuzz.ru (data obrashcheniya: 2014-07-02).
  9. Onlayn-sistema monitoringa sotsial'nykh media «Brandspotter». URL: www.brandspotter.ru (data obrashcheniya: 2014-07-07).
  10. Onlayn-sistema monitoringa sotsial'nykh media «Socialbakers». URL: http://www.socialbakers.com (data obrashcheniya: 2014-07-07).
  11. Sistema semanticheskogo serfinga po internet-tekstam «Keywen». URL: http://www.keywen.com (data obrashcheniya: 2014-07-07).
  12. Pang B., Lee L. Opinion mining and sentiment analysis // Foundations and Trends in Information Retrieval. 2008. № 2. P. 1-135.
  13. Pang B., Lee L. Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales // Proceedings of the 43rd annual meeting of the Association for Computational Linguistics (ACL). University of Michigan, USA. 2005. P. 115-124.
  14. Snyder B., Barzilay R. Multiple aspect ranking using the good grief algorithm // Proceedings of the Joint Human Language Technology/North American Chapter of the ACL Conference (HLT-NAACL). 2007. P. 300-307.
  15. Blinov P.D., Kotelnikov E.V. Using distributed representations for aspect-based sentiment analysis // Computational Linguistics and Intellectual Technologies, Papers form the Annual International Conference «Dialogue». 2014. Iss. 13(20). P. 64-75.
  16. Liu B., Hu M., Cheng J. Opinion observer: Analyzing and comparing opinions on the web // Proceedings of WWW. 2005.
  17. Pang B., Lee L., Vaithyanathan S. Thumbs up - sentiment classification using machine learning techniques // EMNLP. 2002. P. 79-86.
  18. Turney P. Thumbs up or thumbs down - semantic orientation applied to unsupervised classification of reviews // Proceedings of the Association for Computational Linguistics. 2002. P. 417-424.
  19. Rabochaya sreda dlya sozdaniya lingvisticheskoy razmetki tekstov GATE. URL: http://gate.ac.uk (data obrashcheniya: 2014-07-07).