The paper gives a short description and classification of approaches to the task of Information Extraction from text, also the existing areas of application for Information Extraction systems and the qualitative and quantitative evaluation of those systems. Observed are the restrictions on the architecture of those systems in the case of real-time text processing.
The area of Information Extraction differs from other directions in information search in that the emphasis is made on extracting the information in the form of frames of typical situations or entities. It is a wide and actively evolving area of research that has many applications and even so has not yet exhausted its potential for application. On a theoretical level many interesting models and technologies for information extraction were developed, this resulted in the task being solved with a rate of near human accuracy. Nevertheless the task of event extraction is far from being solved and it will require much labor and new ideas.
The paper covers the main directions in Information Extraction: approach based on knowledge, approach based on data and combined approach. Approaches are compared based on a set of parameters: labor input, required data volume, degree of result interpretability. A method for evaluating Information Extraction systems based on accuracy and recall is described.
Proposed is a general classification of Information Extraction tasks. Also described is an architecture for Information Extraction systems that includes such tasks as text preprocessing and co-reference resolution. A main qualitative and quantitative assessment of the achieved results is given. This paper has a fairly extensive bibliography which includes 29 titles of mostly contemporary works and that allows to obtain a complete picture of this field of research.