Journal Information-measuring and Control Systems №12 for 2016 г.
Methods and models of intelligent text analysis in knowledge management systems
A.M. Andreev - Ph.D. (Eng.), Associate Professor, Department of Computer Systems, Complexes and Networks Bauman Moscow State Technical University
D.V. Berezkin - Ph.D. (Eng.), Senior Lecturer, Department of Computer Systems, Complexes and Networks Bauman Moscow State Technical University
The article investigates the use of modern methods and models of intelligent text analysis for knowledge management systems. Using investigated in the article models and methods allows you to significantly expand the scope of diverse knowledge management systems in complex information systems, focused on the acquisition, processing of information in the face of uncertainty. In the article the basic text analysis methods were analyzed. It has been proposed to implement the stages of linguistic analysis by constructing a hybrid intelligent information system. The authors analyze the need for the simultaneous analysis of the information provided in both structured and semi-structured (text) form the entire life cycle of knowledge management related to complex scientific and technical projects. For realization of this article it was suggested that an ontological mapping, based on the neutral model integrating ISO 15926 standard. The hybrid intelligent information knowledge management system was developed. The article describes the features of the construction of its main subsystems. Extract information subsystem from disparate sources periodically performs automatic data download from Internet sites and specialized sources. For extracting text documents is carried out quality control of the downloaded information. Quality is ensured through timely detection of failures in the subsystem, due to changes in the source site layout. To ensure the quality of the text information accumulated performed detection and removal of near-duplicate the previously downloaded messages. Storage subsystem as a storage of heterogeneous data using different database. To store text documents you can use an object database ODB-Jupiter, as well as relational databases, such as PostgreSQL and Microsoft SQL Server. The subsystem of data analysis and prediction performs analytical processing of accumulated data. It was developed and implemented the basic methods of analysis of text documents, such as automatic clustering and categorization, sum-marization, highlighting concepts, semantic search. An important function of the subsystem is to analyze and forecast de-velopment on the basis of textual information situations. It is based on the identification in the flow of text event messages related to the given topics. On the basis of the detected events is performed tracing of possible situations and build their future development scenarios. Ontological modeling subsystem of science and education expertise includes in its membership a means of automatically extracting structured information from text using techniques related to the direction of Information Extraction. Recovery is based on the rules that operate on morphological and object-oriented features and generated using machine learning. Subsystem integration expertise ensures the operation of the ontological mapping of heterogeneous knowledge bases, presented in the form of formal ontologies in accordance with the recommendations of standard pool Semantic WEB Consortium W3C. The proposed article models and methods have been successfully used to manage nuclear knowledge to automate the legislative process, and to solve a number of problems related to the collection and processing of reference, statistical and analytical information. The system can be used to identify trends and future directions in the development of science and education, to address the public and national security problems, find innovative methods and technologies, decision support.
