Appearance object model and method of image matching in the convolutional neural networks

350 rub

Journal Neurocomputers №1 for 2019 г.

Article in number:

Type of article: scientific article

DOI: 10.18127/j19998554-201901-03

UDC: 004.93

Keywords: Convolutional neural network appearance object model “deep” feature extraction image matching.

Authors:

V. V. Borisov – Dr.Sc. (Eng.), Professor, Branch of National Research University “MPEI” (Smolensk)

E-mail: vbor67@mail.ru

O. I. Garanin – Post-graduate Student, National Research University “MPEI”

E-mail: hedgehog91@mail.ru

Abstract:

In this article we propose the original model of object appearance in the convolutional neural networks (CNN), which differs from the known solutions by adding a layer of neurons to get "deep" features of objects and the ability to detect, get "deep" features of these objects in real time in a single pass of signals in the CNN.

In addition, the article presents a method that allows you to compare the objects of one image with the objects of another image based on the degree of difference between the “deep” features of these objects.

Experimental evaluation of the results of the proposed method allows us to conclude that its application allows to reduce the time of detection, to get features of visual objects and to match their images in the CNN, and as a result, to provide analysis of dynamic visual scenes in real time. The method presented in the article can be effectively used in solving the problems of analysis of dynamic visual scenes, for example, for visual representation and multiple objects tracking in real time.

The proposed method consists of the following steps: 1) getting “deep” features of object detections in an image 1 and an image 2 using the proposed model; 2) construction of a distance matrix (each element of this matrix corresponds to the distance between the “deep” features of one of the detections of the image 1 and one of the detections of the image 2); 3) establishing the matching between the detections of an image 1 and the detections of an image 2 using a matrix of distances.

Pages: 9-18

References

Ahonen T., Hadid A., Pietikainen M. Face recognition with local binary patterns // Proc. 8th European Conference on Computer Vision (ECCV). 2004. P. 469–481.
Bay H., Tuytelaars T., Gool L.V. SURF: Speeded up robust features // Proc. 10th European Conference on Computer Vision (ECCV). 2006. P. 404–417.
Luo W., Xing J. Multiple object tracking: a literature review [Elektronnyj resurs] // arXiv.org. URL: https://arxiv.org/abs/1409.7618 (data obrashcheniya: 14.03.2018).
Wojke N., Bewley A., Paulus D. Simple online and realtime tracking with a deep association metric [Elektronnyj resurs] // arXiv.org. URL: https://arxiv.org/abs/1703.07402 (data obrashcheniya: 24.02.2018).
Sadeghian A., Alahi A., Savarese S. Tracking the untrackable: learning to track multiple cues with long-term dependencies [Elektronnyj resurs] // arXiv.org. URL: https://arxiv.org/abs/1701.01909 (data obrashcheniya: 24.02.2018).
Held D., Thrun S., Savarese S. Learning to track at 100 FPS with deep regression networks [Elektronnyj resurs] // arXiv.org. URL: https://arxiv.org/abs/1604.01802 (data obrashcheniya: 24.02.2018).
Zhai M., Roshtkhari M., Mori G. Deep learning of appearance models for online object tracking [Elektronnyj resurs] // arXiv.org. URL: https://arxiv.org/abs/1607.02568 (data obrashcheniya: 24.02.2018).
Liu W., Anguelov D., Erhan D., Szegedy C., Reed S. SSD: single shot multibox detector [Elektronnyj resurs] // arXiv.org. URL: https://arxiv.org/abs/1512.02325 (data obrashcheniya: 24.02.2018).
Garanin O.I. Sposob vydeleniya empiricheskogo retseptivnogo polya svertochnoj nejronnoj seti // Nejrokomp'yutery: razrabotka, primenenie. 2017. № 3. S. 63–69.
Garanin O.I. Sposob nastrojki mnogomasshtabnoj modeli detektirovaniya vizual'nykh ob"ektov v svertochnoj nejronnoj seti // Nejrokomp'yutery: razrabotka, primenenie. 2018. № 2. S. 50–56.
Simonyan K., Zisserman A. Very deep convolutional networks for large-scale image recognition // Proc. of the Neural Information Processing Systems Conference. 2015.
Borisov V., Garanin O. A method of dynamic visual scene analysis based on convolutional neural network // Proc. 16th Russian Conference. RCAI-2018. 2018. P. 60–69.

Date of receipt: 27 октября 2018 г.