Hierarchical system of intellectual analysis and recognition of audio and video objects


А. V. Savchenko – Ph.D. (Eng.), Associate Professor, National Research University Higher School of Economics (N. Novgorod); Doctoral-candidate, Nizhniy Novgorod State Technical University n.a. R.E. Alekseev. E-mail:
V. R. Milov – Dr.Sc. (Eng.), Professor, Head of Department «Electronics and Computer Networks», Nizhniy Novgorod State Technical University n.a. R.E. Alekseev. E-mail:

Structural scheme of hierarchical recognition system is developed for automatic analysis of images and speech signals. Its input is one or several audio and/or video streams. They are further divided into a sequence of frames with fixed dimensionality. Each frame is processed further by several detectors of the objects of interest. Detected objects are identified and recognized by the next block. At first, it identifies objects' sufficient properties (characteristics or attributes). To refine some of these attributes given by the model database the discovered objects are classified in the recognition blocks. Previously obtained attributes mould the set of related parameters to increase the classification accuracy. The recognition algorithm is hierarchical. At first, it analyzes the most rough approximations of the query and model objects, e.g., images with low resolution. If in this case it is possible to obtain a reliable solution, the classification algorithm is terminated. On the other case, the description of query object is detailed (e.g., the image recognition is increased) and the recognition process is repeated until it obtains the reliable solution on the J-th step. The maximum number of steps is usually fixed (J = const) for each particular task. Each next step uses the results of the classification on the previous step. For instance, in the statistical approach the prior probabilities of each class in the next step are assigned to the posterior probabilities estimated on the previous step. To reject unreliable solution, we use the Chow's rule of comparison of the maximal posterior probability with the fixed threshold. Outputs of the identification and recognition blocks are fused in a committee machine block to obtain the single description of observed objects. The result for each frame is combined with the recognition and detection results of the previous frames to get the list of observed objects' descriptions. This list with the set of attributes of each object of interest is transmitted to the control subsystem for automatic processing, on-line notification of the decision maker, etc. Practical examples of the usage of proposed system of intellectual analysis of audio/video objects and its kernel, namely, the block of identification and hierarchical recognition, are presented for various image classification and speech recognition tasks.

May 29, 2020

