350 rub
Journal Neurocomputers №11 for 2014 г.
Article in number:
Hierarchical system of intellectual analysis and recognition of audio and video objects
Authors:
А. V. Savchenko - Ph.D. (Eng.), Associate Professor, National Research University Higher School of Economics (N. Novgorod); Doctoral-candidate, Nizhniy Novgorod State Technical University n.a. R.E. Alekseev. E-mail: avsavchenko@hse.ru
V. R. Milov - Dr.Sc. (Eng.), Professor, Head of Department «Electronics and Computer Networks», Nizhniy Novgorod State Technical University n.a. R.E. Alekseev. E-mail: vladimir.milov@gmail.com
Abstract:
Structural scheme of hierarchical recognition system is developed for automatic analysis of images and speech signals. Its input is one or several audio and/or video streams. They are further divided into a sequence of frames with fixed dimensionality. Each frame is processed further by several detectors of the objects of interest. Detected objects are identified and recognized by the next block. At first, it identifies objects' sufficient properties (characteristics or attributes). To refine some of these attributes given by the model database the discovered objects are classified in the recognition blocks. Previously obtained attributes mould the set of related parameters to increase the classification accuracy. The recognition algorithm is hierarchical. At first, it analyzes the most rough approximations of the query and model objects, e.g., images with low resolution. If in this case it is possible to obtain a reliable solution, the classification algorithm is terminated. On the other case, the description of query object is detailed (e.g., the image recognition is increased) and the recognition process is repeated until it obtains the reliable solution on the J-th step. The maximum number of steps is usually fixed (J = const) for each particular task. Each next step uses the results of the classification on the previous step. For instance, in the statistical approach the prior probabilities of each class in the next step are assigned to the posterior probabilities estimated on the previous step. To reject unreliable solution, we use the Chow's rule of comparison of the maximal posterior probability with the fixed threshold. Outputs of the identification and recognition blocks are fused in a committee machine block to obtain the single description of observed objects. The result for each frame is combined with the recognition and detection results of the previous frames to get the list of observed objects' descriptions. This list with the set of attributes of each object of interest is transmitted to the control subsystem for automatic processing, on-line notification of the decision maker, etc. Practical examples of the usage of proposed system of intellectual analysis of audio/video objects and its kernel, namely, the block of identification and hierarchical recognition, are presented for various image classification and speech recognition tasks.
Pages: 23-30
References

  1. Hawkins J., Blakeslee S. On Intelligence. N.Y.: St. Martin-s Griffin. 2005. 272 с.
  2. Baranov V.G., Milov V.R., Zaripova Yu.Kh. Intellektualizatsiya sistemy raspoznavaniya obrazov na osnove sravneniya effektivnosti metodov klassifikatsii // Informatsionno-izmeritel'nye i upravlyayushchie sistemy. 2010. № 2. S. 35-38.
  3. Bosch A., Zisserman A., Munoz X. Representing Shape with a Spatial Pyramid Kernel // Proceedings of the 6th ACM International Conference on Image and Video Retrieval CIVR - 07. N.Y.: ACM. 2007. С. 401-408.
  4. Munoz D., Bagnell J.A., Hebert M. Stacked Hierarchical Labeling // Proceedings of the 11th European Conference on Computer Vision: Part VI ECCV-10. Berlin, Heidelberg: Springer-Verlag. 2010. С. 57-70.
  5. Utrobin V.A. Vvedenie v teoriyu aktivnogo vospriyatiya // Datchiki i sistemy. 2013. № 7(170). S. 34-39.
  6. Gai V.E. Signal comparison algorithm in terms of a priori uncertainty // Pattern Recognition and Image Analysis. 2013. V. 23. № 3. P. 348-351.
  7. LeCun Y., Bottou L., Bengio Y., Haffner P. Gradient-based learning applied to document recognition // Proceedings of the IEEE. 1998. V. 86. № 11. P. 2278-2324.
  8. Cireşan D., Meier U., Masci J., Schmidhuber J. Multi-column deep neural network for traffic sign classification // Neural Networks. 2012. V. 32. P. 333-338.
  9. Savchenko A.V. Directed enumeration method in image recognition // Pattern Recognition. 2012. V. 45. № 8. P. 2952-2961.
  10. Dalal N., Triggs B. Histograms of oriented gradients for human detection // IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005). 2005. P. 886-893.
  11. Chow C.K. On optimum recognition error and reject trade-off // IEEE Transactions on Information Theory.1970.V. 16. P. 41-46.
  12. Savchenko V.V., Savchenko A.V. Printsip minimal'nogo informatsionnogo rassoglasovaniya v zadache raspoznavaniya diskretnykh ob''ektov // Izv. vuzov. Ser. Radioelektronika. 2005. Vyp. 3. S. 10-18.
  13. Wang H., Wang Y., Cao Y. Video-based face recognition: a survey // World Academy of Science. EngineeringandTechnologies. 2009. V. 60. P. 293-302.
  14. Bellustin N., Kovalchuck A., Telnykh A., Shemagina O., Yakhno V., Kalafati Y., Abhishek Vaish, Pinki Shar-ma, Shirshu Verma. Instant Human Face Attributes Recognition System // IJACSA. International Journal of Advanced Computer Science and Applications, Special Issue on Artificial Intelligence. 2011. P. 112-120.
  15. Theodoridis S., Koutroumbas K. Pattern Recognition, Fourth Edition. Burlington. MA; London: Academic Press. 2008. 984 p.
  16. Savchenko A.V. Adaptive video image recognition system using a committee machine // Optical Memory and Neural Networks. 2012. V. 21. № 4. P. 219-226.
  17. Zhuravlev Yu.I. Ob algebraicheskom podkhode k resheniyu zadach raspoznavaniya ili klassifikatsii // Problemy kibernetiki. 1978. T. 33. S. 5-68.
  18. Savchenko A.V., Khokhlova Ya.I. About neural-network algorithms application in viseme classification problem with face video in audiovisual speech recognition systems // Optical Memory and Neural Networks (Information Optics). 2014. V. 23. № 1. P. 34-42.
  19. RealSpeaker Audio-Visual Speech Recognition - Voice to Text. http://realspeaker.net/
  20. Campr P., Pražák A., Psutka J.V., Psutka J. Online Speaker Adaptation of an Acoustic Model Using Face Recognition // Proceedings of the International Conference on Text, Speech, and Dialogue TSD-2013 Int.Conf. LNCS/LNAI. 2013. V. 8082. P. 378-385.
  21. Milov V.R. Adaptivnyy priem signalov. N.Novgorod: NGTU. 2005. 15 s.
  22. Pawlak Z. Rough Sets: Theoretical Aspects of Reasoning About Data. Norwell. MA: Kluwer Academic Publishers. 1992.
  23. Wald A. Sequential Analysis. N.Y.: Dover Publications. 2013. 224 с.
  24. Savchenko A.V. Probabilistic neural network with homogeneity testing in recognition of discrete patterns set // Neural Networks. 2013. V. 46. P. 227-241.
  25. Specht D.F. Probabilistic neural networks // Neural Networks. 1990. V. 3. № 1. P. 109-118.
  26. Benesty J., Sondhi M.M., Huang Y. Springer Handbook of Speech Processing. Berlin: Springer. 2008. 1176 p.
  27. Savchenko L.V. Algoritm pofonemnogo raspoznavaniya ustnoy rechi na osnove metoda nechetkogo foneticheskogo kodirovaniya-dekodirovaniya slov // Informatsionno-upravlyayushchie sistemy. 2014. № 1. S. 23-31.
  28. Savchenko A.V. Metod foneticheskogo kodirovaniya v zadache raspoznavaniya izolirovannykh slov // Radiotekhnika i elektronika. 2014. № 4. C. 339-345.
  29. Yale Face Database. http://vision.ucsd.edu/content/yale-face-database
  30. AT&T Database of Faces. http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html
  31. Savchenko A.V. Vybor parametrov algoritma raspoznavaniya izobrazheniy na osnove kollektiva reshayushchikh pravil i printsipa maksimuma aposteriornoy veroyatnosti // Komp'yuternaya optika. 2012. T. 36. № 1. S. 117-124.
  32. Savchenko A.V., Milov V.R. Veroyatnostnye neyrosetevye modeli i metody raspoznavaniya sostavnykh ob''ektov // Trudy VI Vseros. nauchno-prakt. konf. «Nechetkie sistemy i myagkie vychisleniya-2014». SPb.: Politekhnika-servis. 2014. T. 2. S. 200-208.