350 rub
Journal Information-measuring and Control Systems №2 for 2013 г.
Article in number:
Extraction of the significant information from files of not structured texts
Authors:
А.I. Zaharenkov, A.V. Sokolov
Abstract:
Function of extraction of the facts from the natural language data should be one of the basic functions of information-analytical system. For this purpose in structure IAS the model of the facts and knowledge base model should be generated. This models should meet following requirements: Thesaurus presence; The data demanding processing should be rubricated; The data, should be registered taking into account maintenance of contextually-reference search; Presence of base of the rules, expanded while in service IAS. The model of a following kind is applied to the description of the facts: , Where ? slot with a name of the subject of the fact (initiates action); ? slot with a name of object of the fact (the object describes result of action); ? slot with a predicate (the semantic relation between the subject and object); Existing methods of extraction of the facts [1-4] are based on extraction from text given сущностей (names of the organisations, names of settlements, etc.) with the subsequent to searches of interrelations between them. However the specified methods have following restrictions: Possibility of formation of summaries and endurances from considered text documents is not provided; There is no possibility of elimination of uncertainty in the presence of identical factors in various sources of the initial data. The method of extraction of the facts from the text documents presented in Russian and English languages, based on use of the rules set by experts is developed. For realisation of a method of extraction of the facts from texts the knowledge base defining rules of extraction entities and relations between them is preliminary generated. The method represents sequence of stages on revealing of minimum syntactic units and to an establishment of communications between them with maintenance of automatic formation of summaries.
Pages: 9-16
References
  1. Sunita Sarawagi // Information Extraction ? Foundations and Trends in Databases. 2007. V. 1. № 3. Р.261-377.
  2. Hamish Cunningham // Automatic Information Extraction. 2004.
  3. Fabian M. Suchanek // Information Extraction. 2011 //  suchanek.name/work/ teaching/IE2011a.pdf.
  4. Fabian M. Suchanek // Natural Language Processing. 2011 // http://suchanek.name/work/teaching/ ? NLP2011senegal /NLP2011senegal.pdf.
  5. Кониченко А.А., Соколов А.В. Классификация последовательностей сигналов, основанная на кодах // Информационно-измерительные и управляющие системы. 2012. № 2.
  6. Бутов А.Л., Миргалеев А.Т. Метод извлечения фактов в информационно-аналитических системах из информации, представленной на естественном языке //Информационно-измерительные и управляющие системы. 2012. № 2.
  7. Миргалеев А.Т., Теплова В.В. Подход к формализации задачи оценки времени эвакуации людей с этажа образовательного учреждения в информационно-аналитических системах пожарной безопасности // Информационно-измерительные и управляющие системы. 2012. № 2.