350 rub
Journal Neurocomputers №1 for 2025 г.
Article in number:
Research of unsupervised machine learning methods for data analy-sis and preparation and modern approaches to building machine learning models – automated machine learning systems
Type of article: scientific article
DOI: https://doi.org/10.18127/j19998554-202501-04
UDC: 004.89
Authors:

I.A. Popova1, G.I. Afanasyev2, V.B. Timofeev3, Yu.E. Gapanyuk4
1–4 Bauman Moscow State Technical University (Moscow, Russia)

1 popovai1@student.bmstu.ru, 2 gaipcs@bmstu.ru, 3 vbtimofeev@yandex.ru, 4 gapyu@bmstu.ru

Abstract:

Data preprocessing in machine learning tasks is an important step in the data mining process. To automate the data processing process and make it more suitable for the data under study, data preprocessing methods are implemented in AutoML systems. The purpose of the work is to compare the quality of work of AutoML systems for building a target model and training it. Study of the operation of modern AutoML systems has been conducted. Recommendations have been proposed for the use of unsupervised machine learning algorithms for the tasks of filling gaps, detecting and removing anomalies, and reducing the dimensionality of a data set. The conducted research allows us to determine the applicability of modern AutoML systems for building a machine learning model, to better understand the features of the systems, and to find out the possibility of their use in solving practical problems.

Pages: 47-55
References
  1. Zheron O. Prikladnoe mashinnoe obuchenie s pomoshch'yu Scikit-Learn i TensorFlow. Kontseptsii, instrumenty i tekhniki dlya sozdaniya intellektual'nykh sistem. M.: Dialektika-Vil'yams. 2018. (in Russian)
  2. Ofitsial'nyj sajt biblioteki Scikit-learn [Elektronnyj resurs]. URL: https://scikit-learn.org/stable (data obrashcheniya: 14.03.2022).
  3. Ofitsial'nyj sajt biblioteki Matplotlib [Elektronnyj resurs]. URL: https://matplotlib.org (data obrashcheniya: 14.03.2022).
  4. Ofitsial'nyj sajt biblioteki Missingno [Elektronnyj resurs]. URL: https://github.com/ResidentMario/missingno (data obrashcheniya: 14.03.2022).
  5. Handle missing data in Python [Elektronnyj resurs]. URL: https://machinelearningmastery.com/handle-missing-data-python/ (data obra­shcheniya 24.03.2022).
  6. Devi S., Mejsman A., Mokhamed A. Osnovy Data Science i Big Data. Python i nauka o dannykh. Sankt-Peterburg: Piter. 2017. (in Russian)
  7. Ajvazyan S.A., Enyukov I.S., Meshalkin L.D. Prikladnaya statistika: osnovy modelirovaniya i pervichnaya obrabotka dannykh. M.: Finansy i statistika. 1983.
  8. Santu S.K. AutoML to date and beyond: Challenges and opportunities. arXiv. 2010. P. 4076–4084. DOI: 10.1109/CVPR.2017.434.
  9. GooglePlayStoreApps [Elektronnyj resurs]. URL: https://www.kaggle.com/datasets/lava18/google-play-store-apps (data obrashcheniya: 07.11.2022).
  10. Koroteev M.V. Review of some modern trends in machine learning technology. E-Management. 2018. V. 1. № 1. P. 26–35. (in Russian)
  11. Karmaker S., Hassan M.M., Smith M.J., Xu L., Zhai C., Veeramachaneni K. AutoML to date and beyond: Challenges and opportunities. ACM Computing Surveys. 2022. V. 54. P. 1–36.
  12. Galkin V.A., Biushkin I.S., Zhuravleva U.V. Analiz programmnogo koda s ispol'zovaniem ansamblevykh metodov mashinnogo obucheniya. Dinamika slozhnykh sistem. 2020. № 2. S. 34–41. DOI: 10.18127/j19997493-202002-04. (in Russian)
  13. Myshenkov K.S., Nekula Khaddad. Ispol'zovanie metodov mashinnogo obucheniya dlya prognozirovaniya nevrologicheskikh zabolevanij. Dinamika slozhnykh sistem. 2022. № 1. S. 66–74. DOI: 10.18127/j19997493-202201-07. (in Russian)
  14. Gapanyuk Yu.E., Zenger A.S., Tsvetkova A.K., Kochkin S.A., Cherkov V.V. Postroenie rekomendatel'noj sistemy na osnove podkhoda gibridnykh intellektual'nykh informatsionnykh system. Dinamika slozhnykh sistem. 2020. № 2. S. 42–53. DOI: 10.18127/j19997493-202002-05. (in Russian)
  15. Assessing the quality of machine learning models [Elektronnyj resurs]. URL: https://nbviewer.org/github/ugapanyuk/courses_current/ blob/main/notebooks/metrics/metrics.ipynb (data obrashcheniya: 03.02.2025).
Date of receipt: 26.07.2024
Approved after review: 21.08.2024
Accepted for publication: 24.01.2025