Methods and models of intelligent text analysis in knowledge management systems

350 rub

Journal Information-measuring and Control Systems №12 for 2016 г.

Article in number:

Keywords: intelligent text analysis hybrid intelligent information knowledge management system diverse knowledge management data mining

Authors:

A.M. Andreev - Ph.D. (Eng.), Associate Professor, Department of Computer Systems, Complexes and Networks Bauman Moscow State Technical University E-mail: arkandreev@gmail.com D.V. Berezkin - Ph.D. (Eng.), Senior Lecturer, Department of Computer Systems, Complexes and Networks Bauman Moscow State Technical University E-mail: berezkind@bmstu.ru

Abstract:

The article investigates the use of modern methods and models of intelligent text analysis for knowledge management systems. Using investigated in the article models and methods allows you to significantly expand the scope of diverse knowledge management systems in complex information systems, focused on the acquisition, processing of information in the face of uncertainty. In the article the basic text analysis methods were analyzed. It has been proposed to implement the stages of linguistic analysis by constructing a hybrid intelligent information system. The authors analyze the need for the simultaneous analysis of the information provided in both structured and semi-structured (text) form the entire life cycle of knowledge management related to complex scientific and technical projects. For realization of this article it was suggested that an ontological mapping, based on the neutral model integrating ISO 15926 standard. The hybrid intelligent information knowledge management system was developed. The article describes the features of the construction of its main subsystems. Extract information subsystem from disparate sources periodically performs automatic data download from Internet sites and specialized sources. For extracting text documents is carried out quality control of the downloaded information. Quality is ensured through timely detection of failures in the subsystem, due to changes in the source site layout. To ensure the quality of the text information accumulated performed detection and removal of near-duplicate the previously downloaded messages. Storage subsystem as a storage of heterogeneous data using different database. To store text documents you can use an object database ODB-Jupiter, as well as relational databases, such as PostgreSQL and Microsoft SQL Server. The subsystem of data analysis and prediction performs analytical processing of accumulated data. It was developed and implemented the basic methods of analysis of text documents, such as automatic clustering and categorization, sum-marization, highlighting concepts, semantic search. An important function of the subsystem is to analyze and forecast de-velopment on the basis of textual information situations. It is based on the identification in the flow of text event messages related to the given topics. On the basis of the detected events is performed tracing of possible situations and build their future development scenarios. Ontological modeling subsystem of science and education expertise includes in its membership a means of automatically extracting structured information from text using techniques related to the direction of Information Extraction. Recovery is based on the rules that operate on morphological and object-oriented features and generated using machine learning. Subsystem integration expertise ensures the operation of the ontological mapping of heterogeneous knowledge bases, presented in the form of formal ontologies in accordance with the recommendations of standard pool Semantic WEB Consortium W3C. The proposed article models and methods have been successfully used to manage nuclear knowledge to automate the legislative process, and to solve a number of problems related to the collection and processing of reference, statistical and analytical information. The system can be used to identify trends and future directions in the development of science and education, to address the public and national security problems, find innovative methods and technologies, decision support.

Pages: 111-121

References

Bashmakov A.I., Starykh V.A. Principy i tekhnologicheskie osnovy sozdanija otkrytykh informacionno-obrazovatelnykh sred. M.: Binom. Laboratorija znanijj. 2010. 720s.
W3C Semantic Web Activity. Rezhim dostupa: http:www.w3.org/2001/sw/ (data obrashhenija 29.07.2016).
West M. Developing High Quality Data Models. Morgan Kaufmann. 2011. 408 p.
Berezkin D.V. Postroenie ontologicheskogo spravochnika otraslevogo urovnja s uchetom rekomendacijj standarta ISO 15926// Gibridnye i sinergeticheskie intellektualnye sistemy: teorija i praktika. Sbornik nauchnykh trudov 1-go Mezhdunarodnogo simpoziuma (g. Svetlogorsk, 29 ijunja ? 2 ijulja 2012 g.). V 2 t. T. 2. M.: Fizmatlit. 2012. S. 245-254.
Kolesnikov A.V., Kirikov I.A., Listopad S.V. Gibridnye intellektualnye sistemy s samoorganizaciejj: koordinacija, soglasovannost, spor. M.: IPI RAN. 2014. 189 s.
Rajeswari P.V.N., Prasad T.V. Hybrid Systems for Knowledge Representation in Artificial Intelligence // International Journal of Advanced Research in Artificial Intelligence (IJARAI). 2012. T. 1. № 8. S. 31-36.
Andreev A.M., Berezkin D.V., Kozlov I.A., Simakov K.V. Podkhod k avtomatizirovannomu kontrolju raboty sistemy izvlechenija dannykh s veb-sajjtov /// Informatika i ee primenenija. 2013. T. 7. Vyp. 3. S. 2-13.
Andreev A.M., Berezkin D.V., Kozlov I.A., Simakov K.V. Mnogokriterialnyjj metod vyjavlenija nechetkikh dublikatov v potoke tekstovykh soobshhenijj // Sistemy i sredstva informatiki. 2015. T. 25. №. 1. S. 34-53.
Kogalovskijj M.R. EHnciklopedija tekhnologijj baz dannykh. M.: Finansy i statistika. 2005. 800 s.
IPS ODB Text. Rezhim dostupa: https://www.inteltec.ru/odbtext/odbtext.shtml (data obrashhenija 30.07.2016).
Aksenov A. Kak ustroen poisk. Rezhim dostupa: https://habrahabr.ru/company/oleg-bunin/blog/310208/ (data obrashhenija 15.10.2016).
Elasticsearch. Search & Analyze Data in Real Time. Rezhim dostupa: https://www.elastic.co/products/elasticsearch (data obrashhenija 30.07.2016).
Samarev R.S. Metody i modeli proektirovanija parallelnykh SUBD: Dis. ... kand. tekhn. nauk: M.: MGTU im. N.EH. Baumana. 2007. 254 s.
Andreev A.M., Berezkin D.V., Samarev R.S. Primenenie algebraicheskikh modelejj pri razrabotke SUBD i IS na ikh osnove // Informacionnye tekhnologii. 2007. № 11. S. 53-58.
Cattell R. Scalable SQL and NoSQL data stores //Acm Sigmod Record. 2011. T. 39. № 4. S. 12-27.
Samokhvalov EH.N., Revunkov G.I., Gapanjuk JU.E. Ispolzovanie metagrafov dlja opisanija semantiki i pragmatiki informacionnykh sistem // Vestnik MGTU im. N.EH. Baumana. Ser. Priborostroenie. 2015. Vyp. №1.
Andreev A.M., Berezkin D.V., Morozov V.V., Simakov K.V. A vtomaticheskaja klassifikacija tekstovykh dokumentov s ispolzovaniem nejjrosetevykh algoritmov i semanticheskogo analiza // EHlektronnye biblioteki: perspektivnye metody i tekhnologii, ehlektronnye kollekcii: Trudy Pjatojj Vserossijjskojj nauchnojj konferencii RCDL-2003(g. Sankt-Peterburg, 29-31 oktjabrja 2003 g.). Sankt-Peterburg: NII KHimii SpbGU, 2003. S.140-149.
Andreev A.M., Berezkin D.V., Brik A.V., Smirnov JU.M. Ispolzovanie statisticheskikh metodov dlja sozdanija lingvisticheskogo obespechenija informacionno-poiskovojj sistemy // Vestnik MGTU. Ser. Priborostroenie. 2001. № 2. S. 13-24.
Andreev A.M., Berezkin D.V., Simakov K.V., SHarov JU.L. Analiticheskaja obrabotka tekstovojj informacii v zadachakh monitoringa posledstvijj prinjatija zakonov i avtomatizacii lingvisticheskojj ehkspertizy zakonoproektov // Analiticheskijj vestnik Analiticheskogo upravlenija Apparata Soveta Federacii. 2011. № 27 (439). S. 46-64.
Poibeau T. et al. Multi-source, Multilingual Information Extraction and Summarization. Theory and Applications of Natural Language Processing. Springer-Verlag Berlin Heidelberg. 2013. 375 s.
Simakov K.V. Modeli i metody izvlechenija znanijj iz tekstov na estestvennom jazyke: Dis. ... kand. tekhn. nauk. M.: MGTU im. N.EH. Baumana. 2008. 267 s.
Andreev A.M., Berezkin D.V., Simakov K.V. Model izvlechenija faktov iz estestvenno-jazykovykh tekstov i metod ee obuchenija // EHlektronnye biblioteki: perspektivnye metody i tekhnologii, ehlektronnye kollekcii: Trudy Vosmojj Vserossijjskojj nauchnojj konferencii RCDL-2006 (g. Suzdal, 17-19 oktjabrja 2006 g.). JAroslavl: JAroslavskijj gos. un-t im. P.G. Demidova. 2006. S. 252-261.
Andreev A.M., Berezkin D.V., Rymar V.S., Simakov K.V. Ispolzovanie tekhnologii Semantic Web v sisteme poiska nesootvetstvijj v tekstakh dokumentov // EHlektronnye biblioteki: perspektivnye metody i tekhnologii, ehlektronnye kollekcii: Trudy Vosmojj Vserossijjskojj nauchnojj konferencii RCDL-2006 (g. Suzdal, 17-19 oktjabrja 2006 g.). JAroslavl: JAroslavskijj gos. un-t im. P.G. Demidova. 2006. S. 263-269.
Brik A.V. Issledovanie i razrabotka verojatnostnykh metodov sintaksicheskogo analiza teksta na estestvennom jazyke: Dis. - kand. tekhn. nauk. M.: MGTU im. N.EH. Baumana. 2002. 160 s.
SHabanov V.I. Modeli i metody avtomaticheskojj klassifikacii tekstovykh dokumentov: Dis. ?. kand. tekhn. nauk. M.: MGTU im. N.EH. Baumana. 2003. 227 s.
Svirin I.S. Predstavlenie informacii v baze znanijj adaptivnojj ehkspertnojj sistemy i ocenka ee approksimirujushhikh svojjstv: Dis. - kand. tekhn. nauk. M.: MGTU im. N.EH. Baumana. 2006. 265 s.
SHtuca I.M. Modeli i algoritmy prinjatija reshenijj na osnove geneticheskogo poiska: Dis. ... kand. tekhn. nauk. M.: MGTU im. N.EH. Baumana. 2008. 204 s.
SHouman M.A.EH. Mnogojazykovyjj informacionnyjj poisk s ispolzovaniem multiagentnojj platformy: Dis. ... kand. tekhn. nauk. M.: MGTU im. N.EH. Baumana. 2015. 132 s.