M.E. Ismagulov1
1 Yugra State University (Khanty-Mansiysk, Russia)
1 m_ismagulov@ugrasu.ru
Problem statement. Multimodal video lecture–to–text transformation is commonly addressed using multimodal large language models; however, their application is limited by context loss in long videos, low interpretability of the results, and high computational and data requirements. Objective. To develop a multimodel pipeline-based method for multimodal video lecture transformation, in which a video lecture is decomposed into separate modalities processed by specialized models, while a large language model is used only at the final stage for text structuring. Results. A multimodel transformation algorithm has been implemented in the form of a multi-agent system, providing a complete processing cycle for one of the target video lecture formats. Practical significance. The proposed method can be applied to automatic generation of structured textual representations of online courses, preparation of educational materials, and automated documentation of webinars and scientific events.
Ismagulov M.E. Multimodel data processing method for solving multimodal video lecture processing task. Highly Available Systems. 2026. V. 22. № 1. P. 85−89. DOI: https://doi.org/10.18127/j20729472-202601-17 (in Russian)
- Xie T., Kuang Y., Tang Y., Liao J., Yang Y. Using LLM-supported lecture summarization sys-tem to improve knowledge recall and student satisfaction. Expert Systems with Applications. 2025. V. 269. Art. 126371. https://doi.org/10.1016/j.eswa.2024.126371
- Wang J., Kang Z., Wang H. et al. VGR: Visual Grounded Reasoning. arXiv:2506.11991. 2025. https://arxiv.org/abs/2506.11991 (accessed: 13.01.2026).
- Nikolaou N., Salazar D., RaviPrakash H. et al. A Machine Learning Approach for Multi-modal Data Fusion for Survival Prediction in Cancer Patients. npj Precision Oncology. 2025. V. 9. Art. 128. https://doi.org/10.1038/s41698-025-00917-6
- Shambour Q.Y., Al-Zyoud M.M., Hussein A.H. From Data to Diagnosis: Knowledge-Driven, Explainable AI for Reliable Early Autism Detection. Interdisciplinary Journal of Information, Knowledge, and Management. 2025. V. 20. P. 032. https://doi.org/10.28945/5652
- Bely`x A.A., Shajdulin R.F., Gureev K.A., Xaritonov V.A., Alekseev A.O. Princip mnogomodel`nosti v zadachax modelirovaniya individual`ny`x predpochtenij/ Upravlenie bol`shimi sistemami. Specz. vy`pusk 30.1: Setevy`e modeli v upravlenii. Perm`, 2010. S. 128–140.
- Bessonov P.E., Pivovarov O.G. Prognozirovanie texnicheskogo sostoyaniya ob``ektov nazemny`x kompleksov na osnove principa mnogomodel`nosti. Kosmos. 2011. № 2. S. 45–52.
- Ismagulov M.E. Konvejerny`j mul`timodal`ny`j nejrosetevoj metod obrabotki video. Sistemnaya inzheneriya i informacionny`e texnologii. 2025. T. 7. № 1(20). S. 78–85.
- Ogundulu O. Methodological Foundations for Merging Structured and Unstructured Sources in ML Pipelines. The American Journal of Engineering and Technology. 2025. V. 7. № 9. P. 159–165. https://doi.org/10.37547/tajet/Volume07Issue09-10
- Ismagulov M.E. Repozitorij e`ksperimental`ny`x danny`x i rezul`tatov obrabotki videolekcij v ramkax mnogomodel`nogo konvejera [E`lektronny`j resurs]. GitHub. Rezhim dostupa: https://github.com/MilanIsmagulov/Multimodel-Pipeline-Result.git (data obrashheniya: 09.02.2026).

