Cross-platform text generation on edge devices using WebLLM

500 rub

Journal Dynamics of Complex Systems - XXI century №2 for 2026 г.

Article in number:

Type of article: scientific article

DOI: https://doi.org/10.18127/j19997493-202602-09

UDC: 004.912

Keywords: Natural language processing text generation WebLLM ROUGE quantization

Authors:

A.I. Kanev1

1 Bauman Moscow State Technical University (Moscow, Russia)

1 aikanev@bmstu.ru

Abstract:

Large language models are currently used in everyday life for question answering, machine translation, and many other tasks. These applications most often utilize cloud services, which run models with a large number of parameters, requiring extensive computationnal resources. Users send all their data over the network, which is critical in the fields of medicine, finance, and other fields. Therefore, research into lightweight models that can be run directly on the user's computer is relevant. The aim of this study is to investigate the capabilities of distilled and quantized models running in a browser using WebLLM. The study revealed that distilled quantized models are capable of producing answers in a time comparable to popular cloud services. These models demonstrated good performance in machine translation. The responses of models running locally are generated in n atural language and contain few grammatical errors. How ever, these responses are semantically incorrect, and are more likely paraphrases of the questions. Therefore, to preserve the meaning of the question, document fragments must be fed to the model's input, which requires the use of technologies such as RAG. Also, when using local models, devices generate significant heat, limiting their use to relatively short responses without generating very large texts. The results of the study are important for creating systems and applications that require user data and limit its dissemination over the network. These results will also be useful in conditions of limited internet performance, network congestion, or network disconnection.

Pages: 79-85

For citation

Kanev A.I. Cross-platform text generation on edge devices using WebLLM. Dynamics of complex systems. 2026. V. 20. № 2. P. 79−85. DOI: 10.18127/j19997493-202602-09 (in Russian).

References

Efimova V.А. Metody kontekstualnoj generacii izobrazhenij vvysokom razreshenii vuslovijah ogranichennosti vychislitelnyh moshhnostej inedostatka obuchajushhih dannyh. diss. kand. tehn. nauk. 2023.
Dong C., Li T.Z., Xu K., Wang Z., Maldonado F., Sandler K., Landman B.A., Huo Y. Characterizing browser-based medical imaging AI with serverless edgecomputing: towardsaddressing clinicaldata securityconstraints. In Proceedings of SPIE – the International Society for Optical Engineering. Apr. 2023. V. 12469. P. 1246907.
Shen H., Zafrir O., Dong B., Meng H., Ye X., Wang Z., Ding Y., Chang H., Boudoukh G., Wasserblat M. Fast distilbert on cpus. 2022. arXiv preprint arXiv:2211.07715.
Ruan C.F., Qin Y., Zhou X., Lai R., Jin H., Dong Y., Hou B., Yu M.S., Zhai Y., Agarwal S., Cao H. WebLLM: A High-Performance InBrowser LLM Inference Engine. 2024. arXiv preprint arXiv:2412.15803.
Tan J., Dou Z., Wang W., Wang M., Chen W., Wen J.R. Htmlrag: Html is better than plaintext for modelingretrieved knowledge in rag systems. In Proceedings of the ACM on Web Conference 2 0 2 Apr. 2025. P. 1733–1746.
Wang Z.J., Chau D.H. MeMemo: on-deviceretrieval augmentation for private and personalized text generation. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. July 2024. P. 2765–2770.
Qin R., Liu D., Xu C., Yan Z., Tan Z., Jia Z., Nassereldine A., Li J., Jiang M., Abbasi A., Xiong J. Empirical guidelines for deploying llms onto resource-constrained edge devices./ ACM Transactions on Design Automation of Electronic Systems. 2025. V. 30 (5). P. 1–58.
Kanev A.I. Sravneniye proizvoditel'nostimodeley glubokogo obucheniya pri zapuske na kliyentskom ustroystve ivoblake. Neyro komp'yutery: razrabotka, primeneniye. 2023. T. 25. № 6. S. 27–36.
Chen Z., Ma Y., Shen H., Liu M. WeInfer: Unleashing the Power of WebGPU on LLM Inference in Web Browsers. In Proceedings of the ACM on Web Conference 2 0 2 Apr. 2025. P. 4264–4273. 1
Goncharenko A.I. Vysokoproizvoditelnye nejronnye seti glubokogo obuchenija dljaustrojstv snizkimi vychislitelny mi resursami. diss. kand. tehn. nauk. 2024. 1
Salemi A., Zamani H. Evaluating retrieval quality in retrievalaugmented generation. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. July 2024. P. 2395–2400. 1
Kolomichyk I.V., Dzitiev V.V., Kanev A.I. Information Retrieval System Based on the Knowledge Extraction. 2025 7th International Youth Conference on Radio Electronics, Electrical and Power Engineering (REEPE). Moscow. Russian Federation. 2025. P. 1 – DOI 10.1109/REEPE63962.2025.10971026. 1
Seemakhupt K., Liu S., Khan S. Edgerag: Online-indexed rag for edgedevices. 2024. arXiv preprint arXiv:2412.21023.

Date of receipt: 30.01.2026

Approved after review: 13.02.2026

Accepted for publication: 20.02.2026