Comparing the performance of deep learning models running on client device and in cloud

350 rub

Journal Neurocomputers №6 for 2023 г.

Article in number:

Type of article: scientific article

DOI: https://doi.org/10.18127/j19998554-202306-03

UDC: 004.8

Keywords: Image Classification ONNX ResNet JavaScript

Authors:

A.I. Kanev1

1 Bauman Moscow State Technical University (Moscow, Russia)

1 aikanev@bmstu.ru

Abstract:

A large number of images, audio and video on the Internet requires the development of applications for their semantic analysis and provide better search results. Modern deep learning models successfully cope with many tasks of analyzing images, videos, and other types of data.

Training and inference of neural networks is most often done in Python using deep learning frameworks. This is convenient due to the presence of a large number of libraries for preprocessing various types of data. The use of modern web technologies (HTML5, WebAssembly, WebGL, etc.) makes it possible to carry out an increasing share of client-side calculations in the browser. In turn, technologies such as ONNX and tensorflow.js allow you to run deep learning models right in the browser. This allows you to save server resources, limit the load on the network and ensure the privacy of user data.

It becomes possible to create distributed systems with the transfer of calculations for neural network predictions to front-end applications in JavaScript. But with less powerful processors and GPUs, client devices will perform this output more slowly. It is required to compare the output time on different devices and programming languages.

The author of the study aims to compare the execution time of deep learning models on the client and server sides. To do this, several versions of neural networks with different numbers of parameters were trained on the image classification task. Of these, three models with the greatest accuracy were selected and saved in the ONNX format.

Data on the prediction time of various variants of neural networks converted to the ONNX format were obtained using the example of image classification from the Cifar100 dataset. For these models, execution time was measured on Google Colab servers with CPU and GPU, as well as on client devices with different CPUs in the Chrome browser. The results showed that the ResNet20 network takes the most time to do this on all platforms. This is due to the fact that it has more parameters and connections between neurons, but it, in turn, has the best accuracy. At the same time, even on a client with a weak processor, this model allows you to make 79 predictions per second.

For all models, the prediction time on the client device in a web browser turned out to be an order of magnitude higher than on the server. But by distributing calculations among several client devices, you can compensate for the slowdown in the inference time of neural networks. Therefore, the approach of developing distributed applications with the launch of neural networks on the client side is relevant when it is necessary to save server computing resources, ensure the processing of private user data without sending it over the network, and save time on transmitting information over the network. The approach described by the author of the article can be used to develop distributed applications for which it is important to ensure privacy and low network latency during transmission.

Pages: 27-36

For citation

Kanev A.I. Comparing the performance of deep learning models running on client device and in cloud. Neurocomputers. 2023. V. 25.
№ 6. Р. 27-36. DOI: https://doi.org/10.18127/j19998554-202306-03 (In Russian)

References

Sharma N., Jain V., Mishra A. An analysis of convolutional neural networks for image classification. Procedia computer science. 2018. V. 132. P. 377–384. DOI 10.1016/j.procs.2018.05.198.
Correya A., Alonso-Jiménez P., Marcos-Fernández J., Serra X., Bogdanov D. Essentia TensorFlow models for audio and music processing on the web. [Electronic resource] – Access mode: https://webaudioconf.com/_data/papers/pdf/2021/2021_36.pdf, date of reference 30.08.2023.
Shi Y. An optimizing compiler for ONNX models on heterogeneous systems. [Electronic resource] – Access mode: https://www.ideals.illinois.edu/items/115784, date of reference 30.08.2023.
Eremeev I.Yu., Neretina K.A., Pechurin V.V. Investigation of the possibility of using the YOLOv5 convolutional neural network in radio monitoring complexes for detecting OFDM signals. Electromagnetic waves and electronic systems. 2023. V. 28. № 3. P. 18−27. DOI 10.18127/j15604128-202303-03 (in Russian)
Nepomnyashchiy O.V., Khantimirov A.G., Al-sagheer M.M.I., Shabir S. The use of convolutional neural network in the analysis of electrocardiograms. Neurocomputers. 2023. V. 25. № 2. Р. 58–65. DOI 10.18127/j19998554-202302-05. (In Russian)
Dong C., Li T.Z., Xu K., Wang Z., Maldonado F., Sandler K., Landman B.A., Huo Y. Characterizing browser-based medical imaging AI with serverless edge computing: towards addressing clinical data security constraints. Proceedings of SPIE – the International Society for Optical Engineering. 2023. V. 12469. DOI 10.1117/12.2653626.
Zaheer R., Shaziya H. A study of the optimization algorithms in deep learning. Third International Conference on Inventive Systems and Control. Coimbatore, India. 2019. P. 536–539. DOI 10.1109/ICISC44355.2019.9036442.
Singla S., Feizi S. Improved deterministic l2 robustness on CIFAR-10 and CIFAR-100. arXiv:2108.04062. DOI 10.48550/arXiv.2108. 04062
Wang F., Jiang M., Qian C., Yang S., Li C., Zhang H., Wang X., Tang X. Residual attention network for image classification. Proceedings of the IEEE conference on computer vision and pattern recognition. 2017. P. 3156–3164.
Jin T., Bercea G.T., Le T.D., Chen T., Su G., Imai H., Negishi Y., Leu A., O'Brien K., Kawachiya K., Eichenberger A.E. Compiling onnx neural network models using mlir. arXiv preprint arXiv:2008.08272. 2020.
Ben-Nun T., Besta M., Huber S., Ziogas A.N., Peter D., Hoefler T. A modular benchmarking infrastructure for high-performance and reproducible deep learning. IEEE International Parallel and Distributed Processing Symposium. 2019. P. 66–77.
Hampau R.M., Kaptein M., Van Emden R., Rost T., Malavolta I. An empirical study on the performance and energy consumption of AI containerization strategies for computer-vision tasks on the edge. Proceedings of the 26th International Conference on Evaluation and Assessment in Software Engineering. 2022. P. 50–59.
Lopez-Paz D., Ranzato M.A. Gradient episodic memory for continual learning. Advances in neural information processing systems. 2017.
Xu B., Wang N., Chen T., Li M. Empirical evaluation of rectified activations in convolutional network. arXiv preprint arXiv:1505.00853. 2015.
Sundstedt A. Investigations of Free Text Indexing Using NLP: Comparisons of Search Algorithms and Models in Apache Solr. [Electronic resource] – Access mode: https://liu.diva-portal.org/smash/get/diva2:1779267/FULLTEXT02.pdf, date of reference 30.08.2023.
Jajal P., Jiang W., Tewari A., Woo J., Lu Y.H., Thiruvathukal G.K., Davis J.C. Analysis of Failures and Risks in Deep Learning Model Converters: A Case Study in the ONNX Ecosystem. arXiv preprint arXiv:2303.17708. 2023.
Microsoft and Facebook create open ecosystem for AI model interoperability | Microsoft Azure Blog. [Electronic resource] – Access mode: https://azure.microsoft.com/en-us/blog/microsoft-and-facebook-create-open-ecosystem-for-ai-model-interoperabil¬ity/, date of reference 30.08.2023.
Sobecki A., Szymański J., Gil D., Mora H. Deep learning in the fog. International Journal of Distributed Sensor Networks. 2019. V. 15. № 8. DOI 10.1177/1550147719867072.
Liang Y., Tu Z., Huang L., Lin J. CNNs for NLP in the browser: Client-side deployment and visualization opportunities. Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations. 2018. P. 61–65. DOI 10.18653/v1/N18-5013.
Aly A., Lakhotia K., Zhao S., Mohit M., Oguz B., Arora A., Gupta S., Dewan C., Nelson-Lindall S., Shah R. Pytext: A seamless path from NLP research to production. arXiv preprint arXiv:1812.08729. 2018.
Orlando R., Conia S., Brignone F., Cecconi F., Navigli R. AMuSE-WSD: An all-in-one multilingual system for easy Word Sense Disambiguation. Proceedings of the conference on empirical methods in natural language processing: system demonstrations. 2021. P. 298–307. DOI 10.18653/v1/2021.emnlp-demo.34.
Csanády B., Lukács A. Dilated Convolutional Neural Networks for Lightweight Diacritics Restoration. arXiv preprint arXiv:2201.06757. 2022.
Müller M.N., Brix C., Bak S., Liu C., Johnson T.T. The third international verification of neural networks competition (VNN-COMP 2022): summary and results. arXiv preprint arXiv:2212.10376. 2022.
Rodriguez O., Dassatti A. Deep learning inference in GNU Radio with ONNX. Proceedings of the GNU Radio Conference. 2020.
Becquin G. End-to-end NLP Pipelines in Rust. Proceedings of Second Workshop for NLP Open Source Software (NLP-OSS). 2020. P. 20–25.
Someki M., Higuchi Y., Hayashi T., Watanabe S. ESPnet-ONNX: Bridging a Gap Between Research and Production. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). 2022. P. 420–427.
Sit M., Demir I. Democratizing Deep Learning Applications in Earth and Climate Sciences on the Web: EarthAIHub. Applied sciences. 2023. V. 13. № 5. P. 3185. DOI 10.3390/app13053185.
GitHub – chenyaofo/pytorch-cifar-models: Pretrained models on CIFAR10/100 in PyTorch. [Electronic resource] – Access mode: https://github.com/chenyaofo/pytorch-cifar-models, date of reference 30.08.2023.

Date of receipt: 13.10.2023

Approved after review: 01.11.2023

Accepted for publication: 26.11.2023