T.K. Biryukova
FRC «Computer Science and Control» of RAS (Moscow, Russia)
Classic neural networks suppose trainable parameters to include just weights of neurons. This paper proposes parabolic integrodifferential splines (ID-splines), developed by author, as a new kind of activation function (AF) for neural networks, where ID-splines coefficients are also trainable parameters. Parameters of ID-spline AF together with weights of neurons are vary during the training in order to minimize the loss function thus reducing the training time and increasing the operation speed of the neural network.
The newly developed algorithm enables software implementation of the ID-spline AF as a tool for neural networks construction, training and operation. It is proposed to use the same ID-spline AF for neurons in the same layer, but different for different layers. In this case, the parameters of the ID-spline AF for a particular layer change during the training process independently of the activation functions (AFs) of other network layers.
In order to comply with the continuity condition for the derivative of the parabolic ID-spline on the interval (x x0, n) , its parameters fi (i= 0,...,n) should be calculated using the tridiagonal system of linear algebraic equations:
To solve the system it is necessary to use two more equations arising from the boundary conditions for specific problems. For exam-
ple the values of the grid function (if they are known) in the points (x x0, n) may be used for solving the system above:
f f x0 = ( 0) , f f xn = ( n) . The parameters Iii+1 (i= 0,...,n−1 ) are used as trainable parameters of neural networks.
The grid boundaries and spacing of the nodes of ID-spline AF are best chosen experimentally. The optimal selection of grid nodes allows improving the quality of results produced by the neural network. The formula for a parabolic ID-spline is such that the complexity of the calculations does not depend on whether the grid of nodes is uniform or non-uniform.
An experimental comparison of the results of image classification from the popular FashionMNIST dataset by convolutional neural 0, x< 0 networks with the ID-spline AFs and the well-known ReLUx( ) =AF was carried out. The results reveal that the usage x x, ≥ 0
of the ID-spline AFs provides better accuracy of neural network operation than the ReLU AF. The training time for two convolutional layers network with two ID-spline AFs is just about 2 times longer than with two instances of ReLU AF. Doubling of the training time due to complexity of the ID-spline formula is the acceptable price for significantly better accuracy of the network. Wherein the difference of an operation speed of the networks with ID-spline and ReLU AFs will be negligible. The use of trainable ID-spline AFs makes it possible to simplify the architecture of neural networks without losing their efficiency.
The modification of the well-known neural networks (ResNet etc.) by replacing traditional AFs with ID-spline AFs is a promising approach to increase the neural network operation accuracy. In a majority of cases, such a substitution does not require to train the network from scratch because it allows to use pre-trained on large datasets neuron weights supplied by standard software libraries for neural network construction thus substantially shortening training time.
Biryukova T.K. Signal processing algorithm for neural networks with integrodifferential splines as an activation function and its particular case of image classification. Highly Available Systems. 2021. V. 17. № 2. P. 11–25. DOI: https://doi.org/10.18127/j20729472-202102-02 (in Russian)
- Vershinina A.V., Budzko V.I., Macko N.A. Nekotorye aspekty vozniknoveniya i primeneniya metodov iskusstvennogo intellekta. V sb.: Sistemnyj analiz i informacionnye tekhnologii SAIT-2019. Trudy VIII Mezhdunar/ konferencii. 2019. S. 402–406 (in Russian).
- Sokolov I.A., Budzko V.I., Kalinichenko L.A., Sinicin I.N., Stupnikov S.A. Razvitie rabot v oblasti «Bol'shih Dannyh» v Rossijskoj akademii nauk. Sistemy komp'yuternoj matematiki i ih prilozheniya. 2015. № 16. S. 103–110 (in Russian).
- Budzko V.I. Razvitie sistem vysokoj dostupnosti s primeneniem tekhnologij «Bol'shie Dannye». Sistemy vysokoj dostupnosti. 2013. T. 9. № 4. S. 3–15 (in Russian).
- Osipov G.S. Metody iskusstvennogo intellekta. M.: Fizmatlit. 2011. 296 s. (in Russian).
- Osipov G.S. Lekcii po iskusstvennomu intellektu. M.: KRASAND. 2009. 272 s. (in Russian).
- Biryukova T.K. Postroenie nejronnyh setej razlichnyh tipov s ispol'zovaniem parabolicheskih integrodifferencial'nyh splajnov kak funkcij aktivacii. Sistemy vysokoj dostupnosti. 2020. T. 16. № 4. S. 40−49. DOI: 10.18127/j20729472-202004-0 (in Russian).
- Kireev V.I., Biryukova T.K. Integrodifferencial'nyj metod obrabotki informacii i ego primenenie v chislennom analize. M.: IPI RAN. 2014. 267 s. (in Russian).
- Samy Sadek, Ayoub Al-Hamadi, Bernd Michaelis, Usama Sayed. Image Retrieval Using Cubic Splies Neural Networks. International Journal of Video& Image Processing and Network Security IJVIPNS-IJENS. 2009.V. 09. № 10. P. 5–9.
- Campolucci P., Capperelli F., Guarnieri S., Piazza F., Uncini A. Neural networks with adaptive spline activation function. Proceedings of 8th Mediterranean Electrotechnical Conference on Industrial Applications in Power Systems, Computer Science and Telecommunications (MELECON 96). IEEE. 1996. P. 1442–1445.
- Vecci L., Piazza F., Uncini A. Learning and approximation capabilities of adaptive spline activation function neural networks. Neural Networks. 1998. № 11. P. 259–270.
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. Deep Residual Learning for Image Recognition. arXiv preprint arXiv:1512.03385. 2015. 12 r. https://arxiv.org/abs/1512.03385
- Aston Zhang, Zack C. Lipton, Mu Li, Alex J. Smola, Brent Werness, Rachel Hu, Shuai Zhang, Yi Tay, Anirudh Dagar, Yuan Tang. Dive into Deep Learning. https://d2l.ai/index.html
- Sholle F. Glubokoe obuchenie na Python. SPb.: Piter. 2018. 400 s. (in Russian).
- Sebastian Ruder. An overview of gradient descent optimization algorithms. arXiv preprint arXiv: 1609.04747. 2016. 14 s. https://arxiv.org/abs/1609.04747
- Volkov E. A. Chislennye metody. M.: Nauka. 1982. 254 s. (in Russian).