Journal Neurocomputers №6 for 2023 г.
Comparing the performance of deep learning models running on client device and in cloud
Type of article: scientific article
DOI: https://doi.org/10.18127/j19998554-202306-03
UDC: 004.8

A.I. Kanev1

1 Bauman Moscow State Technical University (Moscow, Russia)

1 aikanev@bmstu.ru


A large number of images, audio and video on the Internet requires the development of applications for their semantic analysis and provide better search results. Modern deep learning models successfully cope with many tasks of analyzing images, videos, and other types of data.

Training and inference of neural networks is most often done in Python using deep learning frameworks. This is convenient due to the presence of a large number of libraries for preprocessing various types of data. The use of modern web technologies (HTML5, WebAssembly, WebGL, etc.) makes it possible to carry out an increasing share of client-side calculations in the browser. In turn, technologies such as ONNX and tensorflow.js allow you to run deep learning models right in the browser. This allows you to save server resources, limit the load on the network and ensure the privacy of user data.

It becomes possible to create distributed systems with the transfer of calculations for neural network predictions to front-end applications in JavaScript. But with less powerful processors and GPUs, client devices will perform this output more slowly. It is required to compare the output time on different devices and programming languages.

The author of the study aims to compare the execution time of deep learning models on the client and server sides. To do this, several versions of neural networks with different numbers of parameters were trained on the image classification task. Of these, three models with the greatest accuracy were selected and saved in the ONNX format.

Data on the prediction time of various variants of neural networks converted to the ONNX format were obtained using the example of image classification from the Cifar100 dataset. For these models, execution time was measured on Google Colab servers with CPU and GPU, as well as on client devices with different CPUs in the Chrome browser. The results showed that the ResNet20 network takes the most time to do this on all platforms. This is due to the fact that it has more parameters and connections between neurons, but it, in turn, has the best accuracy. At the same time, even on a client with a weak processor, this model allows you to make 79 predictions per second.

For all models, the prediction time on the client device in a web browser turned out to be an order of magnitude higher than on the server. But by distributing calculations among several client devices, you can compensate for the slowdown in the inference time of neural networks. Therefore, the approach of developing distributed applications with the launch of neural networks on the client side is relevant when it is necessary to save server computing resources, ensure the processing of private user data without sending it over the network, and save time on transmitting information over the network. The approach described by the author of the article can be used to develop distributed applications for which it is important to ensure privacy and low network latency during transmission.

Pages: 27-36
For citation

Kanev A.I. Comparing the performance of deep learning models running on client device and in cloud. Neurocomputers. 2023. V. 25.
№ 6. Р. 27-36. DOI: https://doi.org/10.18127/j19998554-202306-03 (In Russian)

Date of receipt: 13.10.2023
Approved after review: 01.11.2023
Accepted for publication: 26.11.2023