D.S. Kumankin1, S.A. Yamashkin2
1,2 National Research Mordovia State University (Saransk, Russia)
Machine learning researchers often use a manual approach to create machine learning models, but this is inefficient if the model structure, hyperparameters, and training and testing data change frequently. To automate these processes, a machine learning pipeline must be set up to track data changes, pre-process data, train and tune models, control model versions, and monitor model quality and performance.
The purpose of the study is to review the architectural design principles of machine learning pipelines, and to analyze existing systems within remote sensing (RS) tasks.
The methodological basis of the study is the analysis and generalization of existing architectural approaches to the implementation of machine learning pipelines.
This study examines architectural approaches to the design of machine learning pipelines for remote sensing, analyzes existing systems for orchestration and tracking of data evolution, as well as the design of machine learning pipelines.
The article discusses the main stages of the life cycle of machine learning models for remote sensing tasks. Existing solutions have been reviewed and architectural principles underlying the development of effective machine learning pipelines have been described. The architecture of machine learning pipeline is offered, and its main components and their connection are considered.
The results of the study can be applied to the implementation of effective and scalable machine learning systems aimed at solving problems arising in the field of remote sensing.
Kumankin D.S., Yamashkin S.A. Architectural principles for constructing machine learning pipelines for solving the problem of controlling the process of analyzing Earth remote sensing data. Nonlinear World. 2023. V. 21. № 3. P. 27-37. DOI: https://doi.org/10.18127/j20700970-202303-03 (In Russian)
- Alpaydin E. Introduction to machine learning. Cambridge. Mass: MIT Press. 2004. 415 p.
- Kobernichenko V.G. Radiojelektronnye sistemy distancionnogo zondirovanija Zemli: Ucheb. posobie. Izdatel'stvo Ural'skogo universiteta. 2016 (In Russian).
- Schowengerdt R.A. Remote sensing, models, and methods for image processing. 3rd ed. Burlington. MA: Academic Press. 2007. 515 p.
- Ishikawa F., Yoshioka N. How Do Engineers Perceive Difficulties in Engineering of Machine-Learning Systems? Questionnaire Survey. 2019 IEEE/ACM Joint 7th International Workshop on Conducting Empirical Studies in Industry (CESI) and 6th International Workshop on Software Engineering Research and Industrial Practice (SER&IP). Montreal, QC. Canada: IEEE. 2019. P. 2–9.
- Hewage N., Meedeniya D. Machine Learning Operations: A Survey on MLOps Tool Support. 2022.
- Gudfellou Ja., Bendzhio I., Kurvill' A. Glubokoe obuchenie. Izd. 2-e, ispr. M.: DMK Press. 2018. 651 p. (In Russian).
- Çürükoğlu N., Özyildirim B.M. Deep Learning on Mobile Systems. 2018 Innovations in Intelligent Systems and Applications Conference (ASYU). 2018. P. 1–4.
- Liberty E., et al. Elastic Machine Learning Algorithms in Amazon SageMaker. 2020. P. 737731.
- Baylor D., et al. TFX: A TensorFlow-Based Production-Scale Machine Learning Platform. Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Halifax NS Canada: ACM. 2017. P. 1387–1395.
- Moroney L. The Firebase Realtime Database. The Definitive Guide to Firebase: Build Android Apps on Google’s Mobile Platform. Ed. Moroney L. Berkeley. CA: Apress. 2017. P. 51–71.
- Chekmarev M.A., Kljuev S.G., Bobrov N.D. Analiz metodov obespechenija bezopasnosti sistem mashinnogo obuchenija. Voronezhskij institut vysokih tehnologij. Optimizacija i informacionnye tehnologii. 2022. S. 67 (In Russian).
- Barrak A., Eghan E.E., Adams B. On the Co-evolution of ML Pipelines and Source Code - Empirical Study of DVC Projects. 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). 2021. P. 422–433.
- Data Version Control with Python and DVC – Real Python [Electronic resource]. URL: https://realpython.com/python-data-version-control/ (accessed: 19.06.2023).
- Python R. Data Version Control with Python and DVC – Real Python [Electronic resource]. URL: https://realpython.com/python-data-version-control/ (accessed: 11.06.2023).
- Hapke H., Nelson C. Building Machine Learning Pipelines. O’Reilly Media, Inc. 2020. 367 p.
- Qian B., et al. Orchestrating the Development Lifecycle of Machine Learning-based IoT Applications: A Taxonomy and Survey. ACM Comput. Surv. 2020. V. 53. № 4. P. 82:1-82:47.
- Sugimura P., Hartl F. Building a Reproducible Machine Learning Pipeline. P. 4.
- Shaikh S., et al. An End-To-End Machine Learning Pipeline That Ensures Fairness Policies: arXiv:1710.06876. arXiv. 2017.