350 rub
Journal Neurocomputers №1 for 2013 г.
Article in number:
Technologies of extracting information about events in real time
Authors:
V.D. Solovyev
Abstract:
The paper gives a short description and classification of approaches to the task of Information Extraction from text, also the existing areas of application for Information Extraction systems and the qualitative and quantitative evaluation of those systems. Observed are the restrictions on the architecture of those systems in the case of real-time text processing. The area of Information Extraction differs from other directions in information search in that the emphasis is made on extracting the information in the form of frames of typical situations or entities. It is a wide and actively evolving area of research that has many applications and even so has not yet exhausted its potential for application. On a theoretical level many interesting models and technologies for information extraction were developed, this resulted in the task being solved with a rate of near human accuracy. Nevertheless the task of event extraction is far from being solved and it will require much labor and new ideas. The paper covers the main directions in Information Extraction: approach based on knowledge, approach based on data and combined approach. Approaches are compared based on a set of parameters: labor input, required data volume, degree of result interpretability. A method for evaluating Information Extraction systems based on accuracy and recall is described. Proposed is a general classification of Information Extraction tasks. Also described is an architecture for Information Extraction systems that includes such tasks as text preprocessing and co-reference resolution. A main qualitative and quantitative assessment of the achieved results is given. This paper has a fairly extensive bibliography which includes 29 titles of mostly contemporary works and that allows to obtain a complete picture of this field of research.
Pages: 23-30
References
  1. Gerber, M., Gordon, A. S., and Sagae, K., Open-domain commonsense reasoning using discourse relations from a corpus of weblog stories. In Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading (Stroudsburg, PA, USA, 2010), FAM-LbR - 10, Association for Computational Linguistics. P. 43-51.
  2. Dey, L., Mahajan, A., and Haque Mirajul, S., Document clustering for event identification and trend analysis in market news. In Advances in Pattern Recognition, 2009.ICAPR - 09. Seventh International Conference on (feb. 2009). P. 103 - 106.
  3. Borsje, J., Hogenboom, F., Frasincar, F., Semi-Automatic Financial Events Discovery Based on Lexico-Semantic Patterns. International Journal of Web Engineering and Technology 6(2). 2010. Р. 115-140.
  4. Capet, P., Delavallade, T., Nakamura, T., Sandor, A., Tarsitano, C., Voyatzi, S., Intelligent Information Processing IV, IFIP International Federation for Information Processing. 2008. V. 288. chap. A Risk Assessment System with Automatic Extraction of Event Types. 2008. P. 220-229. Springer Boston.
  5. Frasincar, F., Borsje, J., Levering, L., A Semantic Web-Based Approach for Building Personalized News Services. International Journal of E-Business Research 2009. 5(3). Р. 35-53.
  6. Kamijo, S., Matsushita, Y., Ikeuchi, K., Sakauchi, M., Trac monitoring and accident detection at intersections // IEEE Transactions on Intelligent Transportation Systems. 2000. 1(2). Р. 108-118.
  7. Wei, C.P., Lee, Y.H., Event detection from Online News Documents for Supporting Environmental Scanning. Decision Support Systems. 2004. 36(4). Р85-401.
  8. Nadeau, D.,Satoshi Sekine, A survey of named entity recognition and classification.
  9. Smrž, P. and Mrnuštík, M., Decipher-D4.1.1-WP4-BUT State of the art of event detection methods-PU. Report.Brno University of Technology. 2011.
  10. Etzioni, O., Cafarella, M., Downey, D., Popescu, A.-M., Shaked, T., Soderland, S., Weld, D. S., Yates, A., Unsupervised Named-Entity Extraction from the Web: An Experimental Study. 2005. Artificial Intelligence 165.91-134, Essex: Elsevier Science Publishers.
  11. Witten, I. H., Bray, Z., Mahoui, M., Teahan, W. J. Using Language Models for Generic Entity Extraction // In Proc. International Conference on Machine Learning.Text Mining. 1999.
  12. Maynard, D., Tablan, V., Ursu, C., Cunningham, H., Wilks, Y. Named Entity Recognition from Diverse Text Types // In Proc. Recent Advances in Natural Language Processing. 2001.
  13. Zhu, J., Uren, V., Motta, E., Espotter: Adaptive Named Entity Recognition for Web Browsing. In Proc. Conference Professional Knowledge Management // Intelligent IT Tools for Knowledge Management Systems. 2005.
  14. Brin, S., Extracting Patterns and Relations from the World Wide Web. In Proc. Conference of Extending Database Technology. Workshop on the Web and Databases. 1998.
  15. Cohen, W. W., Sarawagi, S., Exploiting Dictionaries in Named Entity Extraction: Combining Semi-Markov Extraction Processes and Data Integration Methods // In Proc. Conference on Knowledge Discovery in Data. 2004.
  16. Bick, E., A Named Entity Recognizer for Danish.In Proc. Conference on Language. 2004.
  17. Shen D., Zhang, J., Zhou, G., Su, J., Tan, C. L., Effective Adaptation of a Hidden Markov Model-based Named Entity Recognizer for Biomedical Domain. In Proc. Conference of Association for Computational Linguistics.Natural Language Processing in Biomedicine.Resources and Evaluation. 2003.
  18. Settles, B., Biomedical Named Entity Recognition Using Conditional Random Fields and Rich Feature Sets. In Proc. Conference on Computational Linguistics.Joint Workshop on Natural Language Processing in Biomedicine and its Applications. 2004.
  19. Rindfleisch, T. C., Tanabe, L., Weinstein, J. N. EDGAR: Extraction of Drugs, Genes and Relations from the Biomedical Literature. In Proc. Pacific Symposium on Biocomputing. 2000.
  20. Narayanaswamy, Meenakshi, Ravikumar, K. E., Vijay-Shanker, K. A Biological Named Entity Recognizer. In Proc. Pacific Symposium on Biocomputing. 2003.
  21. Segers, R., van Erp, M., van der Meij, L., Aroyo, L., Schreiber, G., Wielinga, B., van Ossenbruggen, J., Oomen, J., and Jacobs, G. Hacking History: Automatic Historical Event Extraction for Enriching Cultural Heritage Multimedia Collections. In Proceedings of the 6th International Conference on Knowledge Capture KCAP11. 2011. P. 1-4.
  22. Vossen, P., Schreiber, G., and van Harmelen, F. The semantics of history: model, methods and application. http://www2.let.vu.nl/oz/cltl/semhis. 2009.
  23. Rizzi, V., Giunchiglia, F., Trecarichi, G., Teyssou, D., Murdock, V., de Polo, A., and Mezaour, A.-D.Project GLocal, Deliverable D1.1 - requirements for event modelling, representation and use. 2010.
  24. Collins, T. D., Mulholland, P., and Zdrahal, Z., Using mobile phones to map online community resources to a physical museum space. Int. J. Web Based Communities 5 (November 2009). Р. 18-32.
  25. Ахо А.,Ульман Дж.Теория синтаксического анализа, перевода и компиляции. М.: Мир. 1978.
  26. Рассел С., Норвиг П. Искусственный интеллект: современный подход = ArtificialIntelligence: a ModernApproach / пер. с англ. и ред. К. А. Птицына. Изд. 2-е. М.: Вильямс.2006.
  27. Kluegl, P., Atzmueller, M., and Puppe, F.TextMarker: A Tool for Rule-Based Information Extraction // Proc. Unstructured Information Management Architecture UIMA, 2nd UIMA@GSCL Workshop. 2009 Conference of the GSCL GesellschaftfürSprachtechnologie und Computerlinguistik.2009.
  28. Nitin Indurkhya and Fred,J. Damerau. Handbook of Natural Language Processing (2nd ed.). 2010. Chapman & Hall/CRC.
  29. Hogenboom, F., Frasincar, F., Kaymak, U., and de Jong., F. An Overview of Event Extraction from Text // Workshop on Detection, Representation, and Exploitation of Events in the Semantic Web (DeRiVE 2011) at Tenth International Semantic Web Conference (ISWC 2011). 2011. V. 779. P. 48-57. CEUR-WS.org.
  30. Рожнов А. В., Жарков И. Д. Алгоритмизация интеллектуальной обработки данных в задачах слабо формальных систем // Нейрокомпьютеры: разработка, применение. 2008. № 1-2.