Analysis of the deep neural networks characteristics in the context of digits recognition task

350 rub

Journal Neurocomputers №2 for 2017 г.

Article in number:

Keywords: deep neural networks representation learning activation function hidden layer rectified linear unit (ReLU) layer-wise initialization regularization early stopping overfitting

Authors:

Y.S. Fedorenko - Post-graduate Student, Department «Information Processing and Control Systems», Bauman Moscow State Technical University E-mail: Fedyura1992@yandex.ru Yu.E. Gapanyuk - Ph.D. (Eng.), Associate Professor, Department «Information Processing and Control Systems», Bauman Moscow State Technical University E-mail: gapyu@bmstu.ru

Abstract:

After successful learning of neural networks with many hidden layers the decay of interest to neural networks in late 1990 - early 2000 has been replaced by new wave of development of this artificial intelligence area, which has been named as deep learning. The prime advantage of deep neural networks is that they are able to automatically build data representation, such that there aren-t need to build feature vector by hand. The simplest example of deep neural network is multilayer perceptrone. The goal of perceptrone learning is approximation of function describing dependence between input features and result on the base of set of pairs "input-output". Theoretically it is proved that neural network with one hidden layer can approximate any function, however there isn-t any guarantees about such neural network size and possibility to find correct weights. In practice neural network with large number of layers often allows decreasing number of elements required to building necessary function and reducing the test set error. For a long time the main difficulties have been connected with the learning of deep architectures. However in the last years the using of layer-dependent weight initialization and rectified linear unit (ReLU) as activation function allows solving this task. The activation function ReLU leads to sparse representations, speeds up neural network training (this allows scaling neural networks for solving more complex tasks) and diminishes the vanishing gradients problem due to linear derivative. Also regularization techniques such as classic L1 and L2 regularization or early stopping have an essential importance. The above mentioned ideas allow building deep models with good generalization ability. The conducted experiments with deep neural networks on the task of handwritten digits recognition demonstrates that while using ReLU activation function the addition of hidden layers allows improving classification results and reducing the number of learned parameters. However at some time the further increasing of the number of hidden layers of neural network leads to increasing test set error because of overfitting. In practice estimation of optimal number of network hidden layers has to do heuristically.

Pages: 24-30

References

Rosenblatt F. The perceptron, a perceiving and recognizing automaton Project Para. Cornell Aeronautical Laboratory. 1957.
Rumelhart D. E., Hinton G. E., Williams R. J. Learning Internal Representations by Error Propagation // Parallel Distributed Processing. V. 1. Cambridge, MA: MIT Press. 1986. P. 318-362.
Hinton G. E., Osindero S., Teh Y. W. A fast learning algorithm for deep belief nets // Neural computation. 2006. V. 18. № 7. P. 1527-1554.
Hill F., Cho K., Korhonen A. Learning Distributed Representations of Sentences from Unlabelled Data // Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. San Diego. California. 2016. P. 1367-1377.
Hornik K., Stinchcombe M., White H. Multilayer feedforward networks are Universal Approximators // Neural Networks. 1989. V. 2. P. 359-366. DOI: 10.1016/0893-6080(89)90020-8.
Leshno M., Lin V.Y., Pinkus A., Schocken S. Multilayer feedforward networks with a nonpolynomial activation function can approximate any function // Neural Networks. 1993. № 6. P. 861-867. DOI: 10.1016/S0893-6080(05)80131-5.
Montufar G.F., Pascanu R., Cho K., Bengio, Y. On the Number of Linear Regions of Deep Neural Networks // Advances in neural information processing systems. 2014. № 27. P. 2924-2932.
Bengio Y., LeCun Y. Scaling learning algorithms towards AI // Large-Scale Kernel Machines, 5. MIT Press, Cambridge, MA, USA. 2007. P. 127-168.
LeCun Y., Bengio Y., Hinton G. Deep Learning // Nature. 2015. V. 521. P. 436-444.
Glorot X., Bengio Y. Understanding the difficulty of training deep feedforward neural networks. Proceedings of the 13-th International Conference on artificial Intelligence and Statistics. Sardinia, Italy, 2010. P. 249-256.
LeCun Y., Cortes C., Burges C. The MNIST Database of handwritten digits. Available at: http://yann.lecun.com/exdb/mnist/ (accessed 15.10.2016).
Bengio Y. et. al. Theano: A Python framework for fast computation of mathematical expressions. Available at: https://arxiv.org/pdf/1605.02688v1.pdf (accessed 15.10.2016).
Bishop C. Pattern recognition and machine learning. Springer Science Business Media. 2006. 758 p.
KHajjkin S. Nejjronnye seti: polnyjj kurs. Izd. 2-e: Per. s angl. N.N. Kussul, A.JU. SHelestova. M.: Viljams. 2006. 1104 s.
CHernenkijj V.M., Terekhov V.I., Gapanjuk JU.E. Struktura gibridnojj intellektualnojj informacionnojj sistemy na osnove metagrafov // Nejjrokompjutery: razrabotka, primenenie. 2016. Vyp. № 9. S. 3-14.