Multimodel data processing method for solving multimodal video lecture processing task

500 rub

Journal Highly available systems №1 for 2026 г.

Article in number:

Type of article: scientific article

DOI: https://doi.org/10.18127/j20729472-202601-17

UDC: 004.85

Keywords: Multimodal video transformation video lecture multimodal processing method multimodel pipeline modality separation machine learning large language models interpretability automatic text generation multi-agent system

Authors:

M.E. Ismagulov1

1 Yugra State University (Khanty-Mansiysk, Russia)

1 m_ismagulov@ugrasu.ru

Abstract:

Problem statement. Multimodal video lecture–to–text transformation is commonly addressed using multimodal large language models; however, their application is limited by context loss in long videos, low interpretability of the results, and high computational and data requirements. Objective. To develop a multimodel pipeline-based method for multimodal video lecture transformation, in which a video lecture is decomposed into separate modalities processed by specialized models, while a large language model is used only at the final stage for text structuring. Results. A multimodel transformation algorithm has been implemented in the form of a multi-agent system, providing a complete processing cycle for one of the target video lecture formats. Practical significance. The proposed method can be applied to automatic generation of structured textual representations of online courses, preparation of educational materials, and automated documentation of webinars and scientific events.

Pages: 85-89

For citation

Ismagulov M.E. Multimodel data processing method for solving multimodal video lecture processing task. Highly Available Systems. 2026. V. 22. № 1. P. 85−89. DOI: https://doi.org/10.18127/j20729472-202601-17 (in Russian)

References

Xie T., Kuang Y., Tang Y., Liao J., Yang Y. Using LLM-supported lecture summarization sys-tem to improve knowledge recall and student satisfaction. Expert Systems with Applications. 2025. V. 269. Art. 126371. https://doi.org/10.1016/j.eswa.2024.126371
Wang J., Kang Z., Wang H. et al. VGR: Visual Grounded Reasoning. arXiv:2506.11991. 2025. https://arxiv.org/abs/2506.11991 (accessed: 13.01.2026).
Nikolaou N., Salazar D., RaviPrakash H. et al. A Machine Learning Approach for Multi-modal Data Fusion for Survival Prediction in Cancer Patients. npj Precision Oncology. 2025. V. 9. Art. 128. https://doi.org/10.1038/s41698-025-00917-6
Shambour Q.Y., Al-Zyoud M.M., Hussein A.H. From Data to Diagnosis: Knowledge-Driven, Explainable AI for Reliable Early Autism Detection. Interdisciplinary Journal of Information, Knowledge, and Management. 2025. V. 20. P. 032. https://doi.org/10.28945/5652
Bely`x A.A., Shajdulin R.F., Gureev K.A., Xaritonov V.A., Alekseev A.O. Princip mnogomodel`nosti v zadachax modelirovaniya individual`ny`x predpochtenij/ Upravlenie bol`shimi sistemami. Specz. vy`pusk 30.1: Setevy`e modeli v upravlenii. Perm`, 2010. S. 128–140.
Bessonov P.E., Pivovarov O.G. Prognozirovanie texnicheskogo sostoyaniya ob``ektov nazemny`x kompleksov na osnove principa mnogomodel`nosti. Kosmos. 2011. № 2. S. 45–52.
Ismagulov M.E. Konvejerny`j mul`timodal`ny`j nejrosetevoj metod obrabotki video. Sistemnaya inzheneriya i informacionny`e texnologii. 2025. T. 7. № 1(20). S. 78–85.
Ogundulu O. Methodological Foundations for Merging Structured and Unstructured Sources in ML Pipelines. The American Journal of Engineering and Technology. 2025. V. 7. № 9. P. 159–165. https://doi.org/10.37547/tajet/Volume07Issue09-10
Ismagulov M.E. Repozitorij e`ksperimental`ny`x danny`x i rezul`tatov obrabotki videolekcij v ramkax mnogomodel`nogo konvejera [E`lektronny`j resurs]. GitHub. Rezhim dostupa: https://github.com/MilanIsmagulov/Multimodel-Pipeline-Result.git (data obrashheniya: 09.02.2026).

Date of receipt: 24.02.2026

Approved after review: 26.02.2026

Accepted for publication: 10.03.2026