A.V. Toutov1
1 V.A. Trapeznikov Institute of Control Sciences of Russian Academy of Sciences (Moscow, Russia)
1 andrew_vidnoe@mail.ru
Data centers must provide sufficient resources for smooth operation of the applications hosted in them under variable load conditions. In addition to traditional Internet applications, the number of applications requiring high-performance computing, such as machine learning, big data processing, virtual desktop infrastructure applications, etc., is increasing. Due to the high parallel computing requirements of applications, the demand for servers with graphics processing units (GPUs) is increasing. Data centers are becoming heterogeneous, including traditional servers and GPU servers. Many works are devoted to the methods of virtual machine placement in traditional cloud data centers, but for heterogeneous data centers, the distribution and provision of GPUs to virtual machines requires further study.
In this paper, we propose a problem statement for optimizing the placement of virtual machines in heterogeneous data centers and propose a method and a solution algorithm to improve the energy efficiency of data centers, while providing sufficient resources for the operation of virtual machines to meet service level agreements and uniform resource utilization. As a result, a method for the initial placement of virtual machines on servers with NVIDIA MIG-enabled GPUs has been developed. It is based on a multi-criteria combinatorial optimization model with binary variables. The criteria are energy consumption, uniform resource loading, and SLA violations. An ant colony algorithm has been proposed that allows obtaining a solution in an acceptable time. According to the simulation results, the proposed method, compared to the FF and BF heuristics used in practice, allows obtaining a balanced solution based on three criteria. The proposed method can be implemented in the resource scheduler of cloud platforms for the initial placement of virtual machines with GPUs, which will improve the efficiency of the equipment used, reduce energy costs, and ensure the fulfillment of SLA agreements.
Toutov A.V. Method for initial GPU-enabled virtual machine placement in heterogeneous data centers. Neurocomputers. 2025. V. 27. № 5. P. 5–16. DOI: https://doi.org/10.18127/j19998554-202505-01 (in Russian)
- Mirin S. Rossijskij rynok oblachnykh infrastrukturnykh servisov 2024 [Elektronnyj resurs]. URL: https://survey.iksconsulting.ru/page598 01703.html (data obrashcheniya: 20.07.2025). (in Russian)
- Dias A.H.T., Correia L.H.A., Malheiros N. A systematic literature review on virtual machine consolidation. ACM Computing Surveys (CSUR). 2021. V. 54. № 8. P. 1–38.
- Saidi K., Bardou D. Task scheduling and VM placement to resource allocation in Cloud computing: challenges and opportunities. Cluster Computing. 2023. V. 26. № 5. P. 3069–3087.
- Lin J. et al. Energy-aware virtual machine placement based on a holistic thermal model for cloud data centers. Future Generation Computer Systems. 2024. V. 161. P. 302–314.
- Tutov A.V., Farkhadov M.P. Metod i algoritm staticheskogo razmeshcheniya virtual'nykh mashin dlya povysheniya effektivnosti funktsionirovaniya infokommunikatsionnoj sistemy tsentrov obrabotki dannykh. Nejrokomp'yutery: razrabotka, primenenie. 2024. T. 26. № 5. S. 107–119. DOI: 10.18127/j19998554-202405-10. (in Russian)
- Toutov A. et al. Optimizing the migration of virtual machines in cloud data centers. International Journal of Embedded and Real-Time Communication Systems (IJERTCS). 2022. V. 13. № 1. P. 1–19.
- Tutov A.V. i dr. Mnogokriterial'naya optimizatsiya razmeshcheniya virtual'nykh mashin po fizicheskim serveram v oblachnykh tsentrakh obrabotki dannykh. T-Comm: Telekommunikatsii i transport. 2021. T. 15. № 1. S. 28–34.
- Baydoun A.M., Zekri A.S. Network-, cost-, and renewable-aware ant colony optimization for energy-efficient virtual machine placement in cloud datacenters. Future Internet. 2025. V. 17. № 6. P. 261.
- Siavashi A., Momtazpour M. GPUCloudSim: an extension of CloudSim for modeling and simulation of GPUs in cloud data centers. The Journal of Supercomputing. 2019. V. 75. № 5. P. 2535–2561.
- NVIDIA Virtual GPU (vGPU) Software [Elektronnyj resurs]. URL: https://docs.nvidia.com/vgpu (data obrashcheniya: 20.07.2025).
- Hong C.H., Spence I., Nikolopoulos D.S. GPU virtualization and scheduling methods: A comprehensive survey. ACM Computing Surveys (CSUR). 2017. V. 50. № 3. P. 1–37.
- NVIDIA Multi-Instance GPU User Guide [Elektronnyj resurs]. URL: https://docs.nvidia.com/datacenter/tesla/mig-user-guide (data obrashcheniya: 20.07.2025).
- Siavashi A., Momtazpour M. A multi-objective framework for optimizing GPU-enabled VM placement in cloud data centers with multi-instance GPU technology. arXiv preprint arXiv:2502.01909. 2025.
- Weng Q. et al. Beware of fragmentation: Scheduling {GPU-Sharing} workloads with fragmentation gradient descent. 2023 USENIX Annual Technical Conference (USENIX ATC 23). 2023. P. 995–1008.
- Kulkarni A.K., Annappa B. GPU-aware resource management in heterogeneous cloud data centers. The Journal of Supercomputing. 2021. V. 77. № 11. P. 12458–12485.
- Sivaraman H., Kurkure U., Vu L. TECN: task selection and placement in GPU enabled clouds using neural networks. 2019 International Conference on High Performance Computing & Simulation (HPCS). IEEE. 2019. P. 890–896.
- Garg A. et al. Virtual machine placement solution for VGPU enabled clouds. 2019 International Conference on High Performance Computing & Simulation (HPCS). IEEE. 2019. P. 897–903.
- Tan C. et al. Serving DNN models with multi-instance gpus: A case of the reconfigurable machine scheduling problem. arXiv preprint arXiv:2109.11067. 2021.
- Siavashi A., Momtazpour M. gVMP: A multi-objective joint VM and vGPU placement heuristic for API remoting-based GPU virtualization and disaggregation in cloud data centers. Journal of Parallel and Distributed Computing. 2023. V. 172. P. 97–113.
- Chung W.C., Tong J.S., Chen Z.H. A fine-grained GPU sharing and job scheduling for deep learning jobs on the cloud. The Journal of Supercomputing. 2025. V. 81. № 2. P. 361.
- Amaral M. et al. Topology-aware gpu scheduling for learning workloads in cloud environments. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 2017. P. 1–12.
- Zhu X. et al. Vapor: A GPU sharing scheduler with communication and computation pipeline for distributed deep learning. 2021 IEEE International Conference on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking. IEEE. 2021. P. 108–116.
- Li B. et al. Miso: exploiting multi-instance GPU capability on multi-tenant GPU clusters. Proceedings of the 13th Symposium on Cloud Computing. 2022. P. 173–189.
- Lee M. et al. ParvaGPU: Efficient spatial GPU sharing for large-scale DNN inference in cloud environments. SC24: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE. 2024. P. 1–14.
- Arima E. et al. Optimizing hardware resource partitioning and job allocations on modern GPUs under power caps. Workshop Proceedings of the 51st International Conference on Parallel Processing. 2022. P. 1–10.
- Fan X., Weber W.D., Barroso L.A. Power provisioning for a warehouse-sized computer. ACM SIGARCH computer architecture news. 2007. V. 35. № 2. P. 13–23.
- Wang S. et al. Study on improved ant colony optimization for bin-packing problem. 2010 International Conference On Computer Design and Applications. IEEE. 2010. V. 4. P. V4-489–V4-491.
- Dorigo M., Birattari M., Stutzle T. Ant colony optimization. IEEE Computational Intelligence Magazine. 2007. V. 1. № 4. P. 28–39.
- Prokurovskij A.A., Tutova N.V., Andreev I.A. Modeli i metody marshrutizatsii informatsionnykh resursov v setyakh dostavki kontenta na osnove otechestvennogo programmnogo obespecheniya. Upravlenie bol'shimi sistemami. 2025. Vyp. 116. S. 321–341. (in Russian)
- Ferdaus M.H. et al. Virtual machine consolidation in cloud data centers using ACO metaheuristic. European Conference on Parallel Processing. Cham: Springer International Publishing. 2014. P. 306–317.

