350 rub
Journal Highly available systems №4 for 2015 г.
Article in number:
Processing unstructured textual data for support of search and rescue operations
D.A. Devyatkin - Junior Research Scientist, Institute for Systems Analysis of FRC CSC RAS (Moscow). E-mail: devyatkin@isa.ru A.O. Shelmanov - Junior Research Scientist, Institute for Systems Analysis of FRC CSC RAS (Moscow). E-mail: shelmanov@isa.ru
The paper discusses the main scientific and technical problems of creating methods and software tools for processing unstructured textual data and providing support for search and rescue operations. We review systems that leverage data and messages from social media for information and analytical support of response and recovery operations in emergency situations. The paper considers methods for focused crawling, for preliminary parsing of crawled data, as well as methods for information extraction from natural language texts. With the review as the background, we propose approaches for solving tasks arising during development of methods and software tools for processing unstructured data and providing search and analytical support of search and rescue operations. We propose an intelligent (ontology-based) crawl strategies for focused crawling and machine learning techniques for classifying indi-vidual pages of target resources. Natural language processing and indexing of texts will be performed by means of the Exactus platform. Rule-based approaches as well as machine learning techniques will be adapted to solve the problem of extracting information from natural language texts related to emergency situations. Ontological resources and lexicons will be created to extract from texts geographical objects, names of ships and aircrafts. The problem of storing structured data will be solved by means of a distributed scalable NO-SQL databases that provide the ability to load and process huge amounts of data on the sufficient hardware. The requirements for the software tools for support of search and rescue operations are suggested. To satisfy these requirements, we propose a distributed service-based architecture. It provides the ability to process big streams of information gathered online, scalability, information security, and low cost of implementation and maintenance of intelligent data processing systems. We are planning to perform experimental evaluation of the considered methods and software tools on the free-access retrospective data about emergences occurred in the Arctic zone.
Pages: 45-60


