D.V. Berezkin – Ph.D.(Eng.), Associate Professor, Department «Computer systems and networks», Bauman Moscow State Technical University
A.Yu. Rozhnev – architect, LLC «Technoserv Consulting» (Moscow)
E-mail: firstname.lastname@example.org, email@example.com
This article is about development of a model for assessing the solvency of customers when issuing credit products based on a study of the borrowers' behavioral profile using regression data on payments made for previous time periods. The model is based on machine learning algorithms, while solving the classification problem for assigning the next customer payment to the «overdue» class. The article provides an overview of existing credit scoring systems that are popular in banks, both in foreign development and in the domestic market, and indicate the possibility of developing a system model without using commercial software. To build the model, a preliminary in-depth analysis of the historical data of the bank’s loan portfolio is required, on the basis of which, taking into account the requirements of business and the credit policy of a financial organization, it becomes possible to build a qualitative model based on the methods of machine learning Data Mining. To solve this problem, there is no need to use expensive software products, it is possible using public machine learning libraries in the Python programming language. In particular, the Pandas library was used in this work.
The algorithms of machine learning that can be used to solve this problem are investigated, their review, implementation in the Python programming language, analysis of their parameters is carried out. An optimal machine learning algorithm was chosen by comparing models for various metrics, taking into account the specifics of the banking business.
To achieve the goal, a statistical and visual analysis of the initial data was carried out, and graphs of interdependence between the attributes of the initial regression sample were constructed. The original data set includes data from one of the banks on credit cards - overdue payments, demographic data, information on a loan, payment history. The data is presented in the context of payments, the target indicator is the sign of overdue payment for October 2005.Models based on machine learning algorithms such as decision trees, logistic regression, k-nearest neighbors, random forest were developed. The selection of optimal parameters based on the cross-validation method. Produced visualization of the model obtained on the basis of the decision tree, a description of the constructed graph. The metrics that were used to compare models: accuracy, precision, recall, error matrix, AUC-ROC curve - coefficient, type of curve. The model based on the decision tree is recommended for use, due to the high accuracy of the results obtained (81% of correctly predicted events) and the convenience of graphical interpretation when solving issues of bank credit scoring. The article compares the results with other similar works.
The approach described in the article to the selection of a model of a decision support system based on machine learning methods can be applied in existing and designed systems of bank credit scoring.
- King Brett. Bank 3.0. Pochemu segodnya bank – e’to ne to, kuda vy’ xodite, a to, chto vy’ delaete. M.: Olimp-Biznes. 2014. 520 s.
- Thomas L.C., Edelman D.B., Crook J.N. Credit scoring and its applications. USA: SIAMP. 2002. 248 p.
- Trofimov V.V. i dr. Informaczionny’e texnologii v e’konomike i upravlenii: uchebnik dlya bakalavrov / Pod red. V.V. Trofimova. Sankt- Peterburgskij gos. un-t e’konomiki i finansov. Izd. 2-e, pererab. i dop. M.: Yurajt: ID Yurajt. 2014. 482 s.
- Ry’bal’chenko Yu.S. Skoring kak instrument oczenki i minimizaczii kreditnogo riska // Molodoj ucheny’j. 2017. № 35. S. 37−40.
- Polishhuk F.S., Romanov A.Yu. Kreditny’j skoring: razrabotka rejtingovoj sistemy’ oczenki riska kreditovaniya fizicheskix licz // Novy’e informaczionny’e texnologii v avtomatizirovanny’x sistemax. 2016. S. 280−282.
- Default of Credit Card Clients Dataset. URL = https://www.kaggle.com/uciml/default-of-credit-card-clients-dataset (data obrashheniya: 02.10.2018).
- Goodfellow I., Bengio Y., Courville A. Deep Learning. USA:MIT Press. 2016. 755 p.