A.S. Sebyakin1
1 Financial University under the government of Russian Federation (Moscow, Russia)
1 249702@edu.fa.ru
The paper surveys the theoretical foundations and the current state of AI-based video generation. The main model families are discussed, including GAN approaches, diffusion models, and transformer-based (autoregressive and masked) architectures, with an emphasis on mechanisms that improve spatial-temporal consistency, controllability (conditioning on text or images) and scaling to longer clips. Key groups of video quality metrics are summarized (perceptual, distributional, semantic and motion/dynamics), and the main practical constraints are outlined: computational cost, generalization to new domains, data requirements, and safety measures against deepfake misuse (watermarking, copyright and compliance). It is shown how video generation can strengthen digital twins of situation centers by supporting “what – if” visualization and staff training.
Sebyakin A.S. Review of artificial intelligence methods for video content generation and their application in building digital twins of situation centers. Neurocomputers. 2026. V. 28. № 2. P. 78–84. DOI: https://doi.org/10.18127/j19998554-202602-07 (in Russian)
- Ramesh A. et al. Zero-shot text-to-image generation. Proceedings of the 38th International Conference on Machine Learning. 2021. V. 139. P. 8821–8831 [Elektronnyj resurs]. URL: https://proceedings.mlr.press/v139/ramesh21a.html (data obrashcheniya: 12.12.2025).
- Chang H. et al. Muse: Text-to-image generation via masked generative transformers. Proceedings of the 40th International Conference on Machine Learning. 2023. V. 202. P. 4055–4075 [Elektronnyj resurs]. URL: https://proceedings.mlr.press/v202/chang23b.html (data obrashcheniya: 12.12.2025).
- Xing Z. et al. A survey on video diffusion models. arXiv preprint. arXiv:2310.10647. 2023 [Elektronnyj resurs]. URL: https://arxiv.org/abs/ 2310.10647 (data obrashcheniya: 12.12.2025).
- Melnik A., Ljubljanac M., Lu C. et al. Video diffusion models: A survey. arXiv preprint. arXiv:2405.03150. 2024 [Elektronnyj resurs]. URL: https://arxiv.org/abs/2405.03150 (data obrashcheniya: 12.12.2025).
- Lei W., Wang J., Ma F. et al. A comprehensive survey on human video generation: Challenges, methods, and insights. arXiv preprint. arXiv:2407.08428. 2024 [Elektronnyj resurs]. URL: https://arxiv.org/abs/2407.08428 (data obrashcheniya: 12.12.2025).
- Li H., Zhang Y., Shi H. et al. A survey: Spatiotemporal consistency in video generation. arXiv preprint. arXiv:2502.17863. 2025 [Elektronnyj resurs]. URL: https://arxiv.org/abs/2502.17863 (data obrashcheniya: 12.12.2025).
- Chen M., Liu X. et al. Neural video generation: State-of-the-art and future directions. ACM Computing Surveys. 2024. V. 56. № 2.
P. 1–35. - Zhang H., Goodfellow I., Metaxas D. et al. Self-attention generative adversarial networks. Proceedings of the 36th International Conference on Machine Learning. 2019. V. 97. P. 7354–7363 [Elektronnyj resurs]. URL: https://proceedings.mlr.press/v97/ zhang19d.html (data obrashcheniya: 12.12.2025).
- Karras T., Aittala M., Hellsten J. et al. Alias-free generative adversarial networks. arXiv preprint. arXiv:2106.12423. 2021 [Elektronnyj resurs]. URL: https://arxiv.org/abs/2106.12423 (data obrashcheniya: 12.12.2025).
- Luo Z. et al. VideoFusion: Decomposed diffusion models for high-quality video generation. arXiv preprint. arXiv:2303.08320. 2023 [Elektronnyj resurs]. URL: https://arxiv.org/abs/2303.08320 (data obrashcheniya: 12.12.2025).
- Sora is here. OpenAI, 9 Dec 2024 [Elektronnyj resurs]. URL: https://openai.com/index/sora-is-here/.
- Villegas R., Yang J., Zou Y. et al. Phenaki: Variable length video generation from open domain textual descriptions. arXiv preprint. arXiv:2209.06794. 2022 [Elektronnyj resurs]. URL: https://arxiv.org/abs/2209.06794 (data obrashcheniya: 12.12.2025).
- Press A. OpenAI releases AI video generator sora but limits how it depicts people. 2024 [Elektronnyj resurs]. URL: https://apnews.com/ article/openai-sora-generative-ai-texttovideo-214d578d048f39c9c7b327f870dc6df8.
- Introducing runway gen-4. 2025 [Elektronnyj resurs]. URL: https://runwayml.com/research/introducing-runway-gen-4.
- Methodologies for the subjective assessment of the quality of television images. Recommendation ITU-R BT.500-15. 2023 [Elektronnyj resurs]. URL: https://www.itu.int/rec/R-REC-BT.500-15-202305-I (data obrashcheniya: 12.12.2025).
- Unterthiner T., van Steenkiste S., Kurach K. et al. Towards accurate generative models of video: A new metric and challenges. arXiv preprint. arXiv:1812.01717. 2019 [Elektronnyj resurs]. URL: https://arxiv.org/abs/1812.01717 (data obrashcheniya: 12.12.2025).
- Luo G.Y., Favero G.M., Luo Z.H. et al. Beyond FVD: Enhanced evaluation metrics for video generation quality. arXiv preprint. arXiv:2410.05203. 2024 [Elektronnyj resurs]. URL: https://arxiv.org/abs/2410.05203 (data obrashcheniya: 12.12.2025).
- Kim P.J., Kim S., Yoo J. STREAM: Spatio-temporal evaluation and analysis metric for video generative models. arXiv preprint. arXiv:2403.09669. 2024 [Elektronnyj resurs]. URL: https://arxiv.org/abs/2403.09669 (data obrashcheniya: 12.12.2025).
- Liu J., Qu Y., Yan Q. et al. Fréchet video motion distance: A metric for evaluating motion consistency in videos. arXiv preprint. arXiv:2407.16124. 2024 [Elektronnyj resurs]. URL: https://arxiv.org/abs/2407.16124 (data obrashcheniya: 12.12.2025).
- Mavlankar A., Li Z., Krasula L. All of netflix’s HDR video streaming is now dynamically optimized. Netflix TechBlog. 29 Nov 2023 [Elektronnyj resurs]. URL: https://netflixtechblog.com/all-of-netflixs-hdr-video-streaming-is-now-dynamically-optimized-e9e0cb15f2ba.
- Radford A. et al. Learning transferable visual models from natural language supervision. arXiv preprint. arXiv:2103.00020. 2021 [Elektronnyj resurs]. URL: https://arxiv.org/abs/2103.00020 (data obrashcheniya: 12.12.2025).
- Fridman A.Ya., Kulikova D.S., Osipov V.Yu., Druzhinin V.Yu. Ontologicheskaya model' tsifrovogo dvojnika dlya intellektual'nykh sistem upravleniya. Problemy upravleniya bezopasnost'yu slozhnykh sistem. 2022. № 3. S. 34–49. (in Russian)
- Purdue University. Digital twin & robotic automation center (DigiTRACKER). 2023 [Elektronnyj resurs]. URL: https://engineering.purdue.edu/digitwin/.
- Deceptive audio or visual media («deepfakes»). 2024 [Elektronnyj resurs]. URL: https://www.ncsl.org/technology-and-communication/ deceptive-audio-or-visual-media-deepfakes-2024-legislation.
- Report on deepfakes: What the Copyright office found and what comes next in AI regulation. Reuters. 18 Dec 2024 [Elektronnyj resurs]. URL: https://www.reuters.com/legal/government/report-deepfakes-what-copyright-office-found-what-comes-next-ai-regulation-2024-12-18/.

