Development of an approach to classifying borrower default based on the method of training individual classifiers for different data clusters

500 rub

Journal Nonlinear World №2 for 2026 г.

Article in number:

Type of article: scientific article

DOI: https://doi.org/10.18127/j20700970-202602-07

UDC: 004.89

Keywords: Borrower default prediction method training individual classifiers for data clusters kMeans

Authors:

A.F. Konstantinov1, L.P. Dyakonova2

1,2 Plekhanov Russian University of Economics (Moscow, Russia)
1 konstantinovaf@gmail.com, 2 Dyakonova.LP@rea.ru

Abstract:

When issuing loans to retail borrowers, banks predict the probability of borrower default based on loan application data, external scoring, borrower interaction history, and other factors. Using the internal ratings approach (IRA), banks are required to reserve funds based on the projected probability of default for their loan portfolio to maintain their sustainability. This article proposes a hybrid borrower default prediction method that trains individual classifiers on data clusters obtained using the kMeans clustering method. The proposed method is part of research into the development of an integrated borrower default prediction method that additionally includes methods for class imbalance correction, methods for identifying anomalies in a separate model, bagging methods, and additional optimization methods during training. To analyze the performance of the clustering method followed by borrower default classification. Propose a method for incorporating a clustering method with subsequent borrower default classification into an integrated borrower default prediction method.

The proposed method demonstrated a significant increase in quality metrics (an increase in average accuracy of 0.139, an increase in f1-score of 0.221, and an increase in accuracy of 0.392) relative to the baseline model without dividing the training data into clusters. The practical applicability of this knowledge lies in reducing the borrower default rate, reducing the amount of funds reserved by banks, and accelerating the development of high-quality machine learning models. The results can also be incorporated into undergraduate training programs related to artificial intelligence and machine learning with a focus on financial data.

Pages: 58-68

For citation

Konstantinov A.F., Dyakonova L.P. Development of an approach to classifying borrower default based on the method of training individual classifiers for different data clusters. Nonlinear World. 2026. V. 24. № 2. P. 58–68. DOI: https:// doi.org/10.18127/ j20700970-202602-07 (In Russian)

References

Information and analytical material “Financial Stability Review for Q4 2024 – Q1 2025” of the Bank of Russia. [Online]. Available: https://cbr.ru/collection/collection/file/55878/4q_2024_1q_2025.pdf access data 18.01.2026.
Litova E. How banks use artificial intelligence in business services. [Online] Available: https://www.vedomosti.ru/kapital/ trends/articles/2024/04/14/1031785-kak-banki-ispolzuyut-iskusstvennii-intellekt-v-obsluzhivanii-biznesa access data 18.01.2026.
Smirnov E. Scoring in Seconds: How Neural Networks Changed Loan Issuance. [Online] Available: https://trends.rbc.ru/ trends/industry/cmrm/644942449a7947981d14f327 access data 18.01.2026.
Gazprombank processes loan applications using artificial intelligence. [Online] Available: https://www.nvi-solutions.ru/all-projects/gazprombank/ (дата обращения: 28.09.2025).
Silva E. C. E., Lopes C., Correia A., Faria S. A logistic regression model for consumer default risk. Journal of Applied Statistics. 2020. V. 47(13-15). P. 1–17.
Lin J. Research on loan default prediction based on logistic regression, randomforest, xgboost and adaboost. SHS Web of Conferences. 2024. V. 181. P. 02008.
Rao H., Wei C. Credit Default Probability Prediction Model Based on XGBoost Algorithm. Academic Journal of Computing & Information Science 2616-5775. V. 7. Is. 10. P. 60–66.
Tsai C.-F. Combining cluster analysis with classifier ensembles to predict financial distress. Information Fusion. 2014. V. 16. P. 46–58.
Kohonen T. The self-organizing map. the IEEE. 1990. V. 78. Is. 9. P. 1464–1480.
Yuan K., Chi G., Zhou Y., Yin H. A novel two-stage hybrid default prediction model with k-means clustering and support vector domain description. Research in International Business and Finance. 2022. V. 59. P. 101536.
Gavrilov V.S. Development of a bundled software and machine learning models to automate analysis of news streams in the financial industry. Nonlinear World journal 2025. V. 23.3. P. 6–14.
Ribeiro M. T., Singh S., Guestrin C. Why Should I Trust You?: Explaining the Predictions of Any Classifier. Available: https://arxiv.org/abs/1602.04938 (дата обращения 28.09.2025).
Dyakonova L., Konstantinov A. Approaches to risk analysis in the financial sector based on machine learning and artificial intelligence methods / MPRA Paper. [Online] Available: https://mpra.ub.uni-muenchen.de/122941/ (дата обращения: 28.09.2025).
Konstantinov A.F., D'yakonova L.P. Sravnitel'nyj analiz metodov snizheniya disbalansa klassov pri postroenii modelej mashinnogo obucheniya v finansovom sektore. Izv. Kabardino-Balkarskogo nauchnogo centra RAN. 2025. T. 27. № 1. S. 143–151 (In Russian).
Jin X., Han J. K-Means Clustering. Encyclopedia of Machine Learning. Springer. Boston. 2011. P. 563–564.
Ke G., Meng Q., Finley T., Wang T., Chen W., Ma W., Ye Q., Liu T.-Y. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. Advances in Neural Information Processing Systems. 2017. P. 3146–3154.

Date of receipt: 04.02.2026

Approved after review: 26.02.2026

Accepted for publication: 03.04.2026