Grid system architecture for solving maintenance and repair tasks of knowledge-intensive products based on dynamic management of computational node states

350 rub

Journal Highly available systems №3 for 2025 г.

Article in number:

Type of article: scientific article

DOI: https://doi.org/10.18127/j20729472-202503-03

UDC: 004.75:629.7.083

Keywords: Grid systems maintenance and repair dynamic resource management containerization functional transformation of nodes

Authors:

A.V. Leonov1, V.I. Munerman2, I.N. Sinitsyn3

1, 2 Smolensk State university (Smolensk, Russia)
3 Federal Research Center «Computer Science and Control» of the Russian Academy of Sciences (Moscow, Russia)
1 alexsandr.leo@yandex.ru, 2 vimoon@gmail.com, 3 sinitsin@dol.ru

Abstract:

Problem Statement. Modern maintenance and repair systems for knowledge-intensive products face critical limitations when processing large volumes of diagnostic data under variable load conditions. Traditional Grid architectures are based on the principle of static function distribution among servers, which leads to inefficient resource utilization. The main drawback of existing approaches is that each server has a fixed specialization, and when system requirements change, it is impossible to quickly redistribute computational resources without lengthy reconfiguration procedures. In traditional systems, servers with expensive pre-installed software for aerodynamic modeling remain idle during periods without corresponding computational tasks, while being unable to be efficiently used for other types of calculations. This creates a paradoxical system state where some servers experience critical overload while other nodes remain idle, leading to suboptimal use of available resources both in terms of individual server downtime and overloading of actively used nodes.

Objective. To develop an innovative Grid system architecture capable of providing dynamic transformation of servers between different functional states for efficient solving of maintenance and repair tasks for knowledge-intensive products, taking into account the geographical distribution of resources. The research aims to fundamentally change the paradigm of perceiving each physical server not as a carrier of fixed functions, but as a universal platform capable of adopting various functional roles depending on current system needs, utilizing modern container technologies and load balancing algorithms such as adapted versions of Follow-the-Sun, as well as software tools that implement dynamic node state management through centralized orchestration systems including Docker, Kubernetes, and Apache Mesos.

Results. An innovative Grid system architecture is proposed, based on the concept of dynamic functional transformation of servers through a centralized state table and container technologies. The foundation of the proposed architecture is a centralized node state management system implemented through a basic state table that is replicated across all servers and clusters of the Grid system. A three-level classification of node states has been developed: fully functional nodes with completely deployed specialized software, partially functional nodes with basic configuration ready for rapid deployment, and potential nodes serving as strategic reserves. An adapted Follow-the-Sun algorithm for load balancing considering time zones has been created, which automatically redirects computational tasks to branches with maximum business activity, ensuring round-the-clock efficient use of global resources. A mathematical model for optimizing server transformations and a software-hardware complex architecture with description of key component interactions are presented, including Grid Controller, Resource Manager, Task Scheduler, Container Orchestrator, and State Synchronization Manager. Docker container technologies are used to implement rapid server transformations between functional states, with transformation time from potential state to fully functional specialized system ranging from several seconds to several minutes.

Practical Significance. Application of the proposed approach allows achieving a qualitatively new level of computational resource utilization by eliminating downtime associated with static function distribution. The system can automatically mobilize necessary computational resources when unplanned situations arise, transforming idle servers into the required configuration within minutes. The dynamic nature of the proposed architecture leads to significant reduction in total cost of ownership through decreased equipment requirements, optimized software licensing costs, and reduced energy consumption. The system supports specialized software configurations for computational fluid dynamics, finite element analysis, diagnostics and monitoring, and predictive analytics, with integration capabilities with existing corporate maintenance systems through ERP integration and monitoring systems ensuring seamless deployment in industrial environments. This leads to substantial reduction in total cost of ownership and increased efficiency of maintenance processes for knowledge-intensive products across aerospace, energy, and automotive industries.

Pages: 31-45

For citation

Leonov A.V., Munerman V.I., Sinitsyn I.N. Grid system architecture for solving maintenance and repair tasks of knowledge-intensive products based on dynamic management of computational node states. Highly Available Systems. 2025. V. 21. № 3. P. 31−45. DOI: https://doi.org/10.18127/j20729472-202503-03 (in Russian)

References

Sinicyn I.N., Shalamov A.S. Lekcii po teorii sistem integrirovannoj logisticheskoj podderzhki upravlenie processami, zhiznenny`j cikl produkcii, zhiznenny`j cikl personala, finansovy`j zhiznenny`j cikl. Izd-e 2-e, pererab. i dop. M.: Torus press. 2019. 1072 s. ISBN 978-5-94588-267-6.
Jardine A.K.S., Lin D., & Banjevic D. A review on machinery diagnostics and prognostics implementing condition-based maintenance. Mechanical Systems and Signal Processing. 2006. 20(7). 1483–1510. doi:10.1016/j.ymssp.2005.09.012
Munerman V.I. Massovaya obrabotka danny`x. Algebraicheskie modeli i metody`. M.: INFRA-M. 2023. 229 s. ISBN 978-5-16-018035-9. DOI 10.12737/1906037. EDN AINTEM.
Kolpakov R.M., Posy`pkin M.A. Verxnyaya i nizhnyaya ocenki trudoemkosti metoda vetvej i granicz dlya zadachi o rance. Diskretnaya matematika. 2010. T. 22. № 1. S. 58–73. DOI 10.4213/dm1084. EDN RLQRPN.
Foster I.T. The Grid: A New Infrastructure for 21st Century Science. Physics Today. 2002. V. 55. № 2. P. 42–47. DOI 10.1063/1.1461327
Berman F., Fox G., Hey A.J.G. Grid Computing: Making the Global Infrastructure a Reality. John Wiley & Sons. 2003. 1060 p. ISBN 978-0-470-85319-1.
Krauter K., Buyya R., Maheswaran M. A taxonomy and survey of grid resource management systems for distributed computing. Software: Practice and Experience. 2002. V. 32. № 2. P. 135–164. DOI 10.1002/spe.432
Buyya R., Abramson D., Giddy J., Stockinger H. Economic models for resource management and scheduling in Grid computing. Concurrency and Computation: Practice and Experience. 2002. V. 14. № 13–15. P. 1507–1542. DOI 10.1002/cpe.690
Azeez I.A., Haque S. Resource Management in Grid Computing: A Review. Greener Journal of Science, Engineering and Technology Research. 2012. V. 2. № 1. P. 034–041. ISSN 2276-7835.
Kothamasu R., Huang S.H., & VerDuin W.H. System health monitoring and prognostics — a review of current paradigms and practices. The International Journal of Advanced Manufacturing Technology. 2006. 28(9-10), 1012–1024. doi:10.1007/s00170-004-2131-6
Vachtsevanos G., Lewis F.L., Roemer M., Hess A., Wu B. Intelligent Fault Diagnosis and Prognosis for Engineering Systems. John Wiley & Sons. 2006. 456 p. ISBN 978-0-471-72999-0.
Lei Y., Li N., Guo L., Li N., Yan T. & Lin J. Machinery health prognostics: A systematic review from data acquisition to RUL prediction. Mechanical Systems and Signal Processing. 2018. 104, 799–834. doi:10.1016/j.ymssp.2017.11.016
Munerman V., Munerman D. Realization of distributed data processing on the basis of container technology / Proceedings of the 2019 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering, ElConRus 2019. Saint Petersburg – Moscow: Institute of Electrical and Electronics Engineers Inc. 2019. P. 1740–1744. DOI 10.1109/EIConRus.2019.8656766. EDN WUTIFX.
Munerman V.I. Arxitektura programmno-apparatnogo kompleksa dlya massovoj obrabotki danny`x na baze mnogomerno-matrichnoj modeli. Sistemy` vy`sokoj dostupnosti. 2015. T. 11. № 2. S. 13–18. EDN UBGECV.
Munerman V.I. Realizaciya parallel`noj obrabotki danny`x v oblachny`x sistemax. Sovremenny`e informacionny`e texnologii i IT-obrazovanie. 2017. T. 13. № 2. S. 57–63. DOI 10.25559/SITITO.2017.2.223. EDN ZMDWXT.
Sinicyn I.N., Titov Yu.P. Upravlenie naborami znachenij parametrov sistemy` metodom murav`iny`x kolonij. Avtomatika i telemexanika. 2023. № 8. S. 153–168. DOI 10.31857/S000523102308010X. EDN HDNFSR.
Sinicyn I.N., Titov Yu.P. Optimizaciya poryadka sledovaniya giperparametrov vy`chislitel`nogo klastera metodom murav`iny`x kolonij. Sistemy` vy`sokoj dostupnosti. 2022. T. 18. № 3. S. 23–37. DOI 10.18127/j20729472-202203-02. EDN JSTBRY.
Sinicyn I.N., Druzhinina O.V., Belousov V.V. i dr. Opy`t razrabotki instrumental`no-metodicheskogo obespecheniya dlya resheniya zadach modelirovaniya nelinejny`x upravlyaemy`x sistem s primeneniem texnologij mashinnogo obucheniya i otechestvenny`x programmno-apparatny`x sredstv. Nelinejny`j mir. 2019. T. 17. № 4. S. 5–19. DOI 10.18127/j20700970-201903-06. EDN AAFYCL.
Evtushenko Yu.G., Posy`pkin M.A. Primenenie metoda neravnomerny`x pokry`tij dlya global`noj optimizacii chastichno celochislenny`x nelinejny`x zadach. Zhurnal vy`chislitel`noj matematiki i matematicheskoj fiziki. 2011. T. 51. № 8. S. 1376–1389. EDN NYFZFR.
Zaikin O.S., Posy`pkin M.A., Semenov A.A., Xrapov N.P. Opy`t organizacii dobrovol`ny`x vy`chislenij na primere proektov OPTIMA@home i SAT@home / Parallel`ny`e vy`chislitel`ny`e texnologii (PaVT2012): Trudy` mezhduna. nauchnoj konf. Novosibirsk, 26–30 marta 2012 g. / Otv. L.B. Sokolinskij, K.S. Pan. Novosibirsk: Izdatel`skij centr YuUrGU. 2012. S. 157–166. EDN SZPNMV.
Merkel D. Docker: lightweight linux containers for consistent development and deployment. Linux Journal. 2014. V. 2014. № 239. Article 2. ISSN 1075-3583.
Bernstein D. Containers and Cloud: From LXC to Docker to Kubernetes. IEEE Cloud Computing. 2014. 1(3), 81–84. doi:10.1109/mcc.2014.51
Pahl C. Containerization and the PaaS Cloud. IEEE Cloud Computing. 2015. 2(3), 24–31. doi:10.1109/mcc.2015.51
Burns B., Beda J., Hightower K. Kubernetes: Up and Running: Dive into the Future of Infrastructure. 2nd ed. OReilly Media. 2019. 368 p. ISBN 978-1-492-04653-0.
Rad B.B., Bhatti H.J., Ahmadi M. An introduction to docker and analysis of its performance. International Journal of Computer Science and Network Security. 2017. V. 17. № 3. P. 228–235.
Dorigo M., Birattari M., & Stutzle T. Ant colony optimization. IEEE Computational Intelligence Magazine. 2006. 1(4), 28–39. doi:10.1109/mci.2006.329691
Kennedy J. and Eberhart R. Particle Swarm Optimization / Proceedings of ICNN95-International Conference on Neural Networks. 1995.
V. 4. IEEE. Perth, WA. 27 November-1 December 1995. 1942–1948. https://doi.org/10.1109/ICNN.1995.488968
Goldberg D.E. Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley Publishing Company. 1989. 412 p. ISBN 978-0-201-15767-3.
Glover F., Laguna M. Tabu Search. Kluwer Academic Publishers. 1997. 382 p. ISBN 978-0-7923-9965-0.

Date of receipt: 25.07.2025

Approved after review: 08.08.2025

Accepted for publication: 29.08.2025