I.A. Popova1, G.I. Afanasyev2, V.B. Timofeev3, Yu.E. Gapanyuk4
1–4 Bauman Moscow State Technical University (Moscow, Russia)
1 popovai1@student.bmstu.ru, 2 gaipcs@bmstu.ru, 3 vbtimofeev@yandex.ru, 4 gapyu@bmstu.ru
Data preprocessing in machine learning tasks is an important step in the data mining process. To automate the data processing process and make it more suitable for the data under study, data preprocessing methods are implemented in AutoML systems. The purpose of the work is to compare the quality of work of AutoML systems for building a target model and training it. Study of the operation of modern AutoML systems has been conducted. Recommendations have been proposed for the use of unsupervised machine learning algorithms for the tasks of filling gaps, detecting and removing anomalies, and reducing the dimensionality of a data set. The conducted research allows us to determine the applicability of modern AutoML systems for building a machine learning model, to better understand the features of the systems, and to find out the possibility of their use in solving practical problems.
- Zheron O. Prikladnoe mashinnoe obuchenie s pomoshch'yu Scikit-Learn i TensorFlow. Kontseptsii, instrumenty i tekhniki dlya sozdaniya intellektual'nykh sistem. M.: Dialektika-Vil'yams. 2018. (in Russian)
- Ofitsial'nyj sajt biblioteki Scikit-learn [Elektronnyj resurs]. URL: https://scikit-learn.org/stable (data obrashcheniya: 14.03.2022).
- Ofitsial'nyj sajt biblioteki Matplotlib [Elektronnyj resurs]. URL: https://matplotlib.org (data obrashcheniya: 14.03.2022).
- Ofitsial'nyj sajt biblioteki Missingno [Elektronnyj resurs]. URL: https://github.com/ResidentMario/missingno (data obrashcheniya: 14.03.2022).
- Handle missing data in Python [Elektronnyj resurs]. URL: https://machinelearningmastery.com/handle-missing-data-python/ (data obrashcheniya 24.03.2022).
- Devi S., Mejsman A., Mokhamed A. Osnovy Data Science i Big Data. Python i nauka o dannykh. Sankt-Peterburg: Piter. 2017. (in Russian)
- Ajvazyan S.A., Enyukov I.S., Meshalkin L.D. Prikladnaya statistika: osnovy modelirovaniya i pervichnaya obrabotka dannykh. M.: Finansy i statistika. 1983.
- Santu S.K. AutoML to date and beyond: Challenges and opportunities. arXiv. 2010. P. 4076–4084. DOI: 10.1109/CVPR.2017.434.
- GooglePlayStoreApps [Elektronnyj resurs]. URL: https://www.kaggle.com/datasets/lava18/google-play-store-apps (data obrashcheniya: 07.11.2022).
- Koroteev M.V. Review of some modern trends in machine learning technology. E-Management. 2018. V. 1. № 1. P. 26–35. (in Russian)
- Karmaker S., Hassan M.M., Smith M.J., Xu L., Zhai C., Veeramachaneni K. AutoML to date and beyond: Challenges and opportunities. ACM Computing Surveys. 2022. V. 54. P. 1–36.
- Galkin V.A., Biushkin I.S., Zhuravleva U.V. Analiz programmnogo koda s ispol'zovaniem ansamblevykh metodov mashinnogo obucheniya. Dinamika slozhnykh sistem. 2020. № 2. S. 34–41. DOI: 10.18127/j19997493-202002-04. (in Russian)
- Myshenkov K.S., Nekula Khaddad. Ispol'zovanie metodov mashinnogo obucheniya dlya prognozirovaniya nevrologicheskikh zabolevanij. Dinamika slozhnykh sistem. 2022. № 1. S. 66–74. DOI: 10.18127/j19997493-202201-07. (in Russian)
- Gapanyuk Yu.E., Zenger A.S., Tsvetkova A.K., Kochkin S.A., Cherkov V.V. Postroenie rekomendatel'noj sistemy na osnove podkhoda gibridnykh intellektual'nykh informatsionnykh system. Dinamika slozhnykh sistem. 2020. № 2. S. 42–53. DOI: 10.18127/j19997493-202002-05. (in Russian)
- Assessing the quality of machine learning models [Elektronnyj resurs]. URL: https://nbviewer.org/github/ugapanyuk/courses_current/ blob/main/notebooks/metrics/metrics.ipynb (data obrashcheniya: 03.02.2025).

