I.S. Markov, N.V. Pivovarova
Bauman Moscow State Technical University (Moscow, Russia)
Formulation of the problem. The problem of using large neural networks with complex architectures on modern devices is considered. They work well, but sometimes their speed is unacceptably low and the amount of memory required to place them on the device is not always available. Briefly describes how to solve these problems by using pruning and quantization. It is proposed to consider an unconventional type of neural networks that can meet the requirements for the occupied memory space, speed and quality of work and describe approaches to training this type of networks.
The aim of the work is to describe modern approaches to reducing the size of neural networks with minimal loss of the quality of their work and to propose an alternative type of networks of small size and high accuracy.
Results. The proposed type of neural network has a large number of advantages in terms of the size and flexibility of layer settings. By varying the parameters of the layers, you can control the size, speed and quality of the network. However, the greater accuracy, the greater the memory volume. To train such a small network, it is proposed to use specific techniques that allow learning complex dependencies based on a more complex and voluminous network. As a result of this learning procedure, it is assumed that only a small network is used, which can then be placed on low-power devices with a small amount of memory.
Practical significance. The described methods allow the use techniques to reduce the size of networks with minimal loss of quality of their work. The proposed architecture makes it possible to train simpler networks without using their size reduction techniques. These networks can work with various data, be it pictures, text, or other information encoded in a numerical vector.
Markov I.S., Pivovarova N.V. Methods for constructing effective neural networks for low-power systems. Dynamics of complex systems. 2021. T. 15. № 2. Р. 48−56. DOI: 10.18127/j19997493-202102-05 (in Russian)
- Savchenkov P. Avtomaticheskoe raspoznavanie rechi na osnove vizual'noj informacii [Elektronnyj resurs]. – Rezhim dostupa: https://core.ac.uk/download/pdf/217173692.pdf (data obrashcheniya: 12.03.2021) (in Russian).
- Kuratov Yu. Specializaciya yazykovyh modelej dlya primeneniya k zadacham obrabotki estestvennogo yazyka [Elektronnyj resurs]. – Rezhim dostupa: https://mipt.ru/upload/medialibrary/02f/avtoreferat-kuratov.pdf (data obrashcheniya: 12.03.2021) (in Russian).
- Belyaeva L.N., Otkupshchikova M.I. Avtomaticheskij (mashinnyj) perevod // Prikladnaya lingvistika. SPb. 1996 (in Russian).
- Assaf Shocher. Semantic Pyramid for Image Generation [Elektronnyj resurs]. – Rezhim dostupa: https://arxiv.org/pdf/2003.06221.pdf (data obrashcheniya 12.03.2021).
- Tero Karras. A Style-Based Generator Architecture for Generative Adversarial Networks [Elektronnyj resurs]. – Rezhim dostupa: https://arxiv.org/pdf/ 1812.04948.pdf (data obrashcheniya: 12.03.2021).
- Tom B. Brown. Language Models are Few-Shot Learners [Elektronnyj resurs]. – Rezhim dostupa: https://arxiv.org/pdf/2005.14165.pdf (data obrashcheniya: 12.03.2021).
- Benoit Jacob. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference [Elektronnyj resurs]. – Rezhim dostupa: https://arxiv.org/pdf/1712.05877.pdf (rezhim dostupa: 12.03.2021).
- Davis Blalock. What is the state of neural network pruning? [Elektronnyj resurs]. – Rezhim dostupa: https://arxiv.org/pdf/2003. 03033.pdf (data obrashcheniya: 12.03.2021).
- Lei Mao. Quantization for Neural Networks [Elektronyj resurs]. – Rezhim dostupa: https://leimao.github.io/article/Neural-NetworksQuantization/ (data obrashcheniya: 12.03.2021).
- Brendan McMahan. Federated Learning: Collaborative Machine Learning without Centralized Training Data [Elektronnyj resurs]. – Rezhim dostupa: https://ai.googleblog.com/2017/04/federated-learning-collaborative.html (data obrashcheniya: 12.03.2021).
- Bottou Leon. Stochastic Gradient Descent Tricks. Neural Networks: Tricks of the Trade. Volume 7700 2012 of the series Lecture Notes in Computer Science. P. 421–436
- Qi Wang. A Comprehensive Survey of Loss Functions in Machine Learning [Elektronnyj resurs]. – Rezhim dostupa: https://link.springer.com/article/10.1007/s40745-020-00253-5 (data obrashcheniya: 12.03.2021).
- Radečić Dario. Softmax Activation Function Explained [Elektronnyj resurs]. – Rezhim dostupa: https://towardsdatascience.com/softmax-activation-function-explained-a7e1bc3ad60 (data obrashcheniya: 12.03.2021).
- Mikolov Tomas. Efficient Estimation of Word Representations in Vector Space [Elektronnyj resurs]. – Rezhim dostupa: https://arxiv.org/pdf/1301.3781.pdf (data obrashcheniya: 12.03.2021).
- Omid Jafari. Experimental Analysis of Locality Sensitive Hashing Techniques for High-Dimensional Approximate Nearest Neighbor Searches [Elektronnyj resurs]. – Rezhim dostupa: https://arxiv.org/pdf/2006.11285.pdf (data obrashcheniya: 12.03.2021).
- Wang J., Learning to hash for indexing big data – A survey, [Elektronnyj resurs]. – Rezhim dostupa: https://www.bibsonomy.org/bibtex/74152873ceb101bc3a698f88bc6883c6 (data obrashcheniya: 12.03.2021).
- Wenhu Chen. How Large a Vocabulary Does Text Classification Need? A Variational Approach to Vocabulary Selection [Elektronnyj resurs]. – Rezhim dostupa: https://www.aclweb.org/anthology/N19-1352.pdf (data obrashcheniya: 12.03.2021).
- Zhixiang (Eddie)Xu. An alternative text representation to TF-IDF and Bag-of-Words [Elektronnyj resurs]. – Rezhim dostupa: https:// arxiv.org/pdf/1301.6770.pdf (data obrashcheniya: 12.03.2021).
- Andow Samantha. Gradient descent: The ultimate optimizer [Elektronnyj resurs]. – Rezhim dostupa: https://arxiv.org/pdf/1909.13371. pdf (data obrashcheniya: 12.03.2021).
- Chaudhari Pratik. Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks [Elektronnyj resurs]. – Rezhim dostupa: https://arxiv.org/pdf/1710.11029.pdf (data obrashcheniya: 12.03.2021).
- Ruder Sebastian. An overview of gradient descent optimization algorithms [Elektronnyj resurs]. – Rezhim dostupa: https://arxiv.org/pdf/1609.04747.pdf (data obrashcheniya: 12.03.2021).
- Hinton Geoffrey. Distilling the Knowledge in a Neural Network [Elektronnyj resurs]. – Rezhim dostupa: https://arxiv.org/pdf/1503. 02531.pdf (data obrashcheniya: 12.03.2021).
- Kaiming He. Deep Residual Learning for Image Recognition [Elektronnyj resurs]. – Rezhim dostupa: https://arxiv.org/pdf/1512.03385. pdf (data obrashcheniya: 12.03.2021).
- Ashish Vaswani. Attention Is All You Need [Elektronnyj resurs]. – Rezhim dostupa: https://arxiv.org/pdf/1706.03762.pdf (data obrashcheniya: 12.03.2021).
- Vladimir Vovk. The fundamental nature of the log loss function [Elektronnyj resurs]. – Rezhim dostupa: https://arxiv.org/pdf/1502. 06254.pdf (data obrashcheniya: 12.03.2021).
- Alexei Botchkarev. Performance Metrics (Error Measures) in Machine Learning Regression, Forecasting and Prognostics: Properties and Typology [Elektronnyj resurs]. – Rezhim dostupa: https://arxiv.org/ftp/arxiv/papers/1809/1809.03006.pdf (data obrashcheniya: 12.03.2021).