D.V. Berezkin, Shi Runfang, Li Tengjiao
Bauman Moscow State Technical University (Moscow, Russia)
This experiment compared the performance of four machine learning algorithms in detecting bank card fraud. At the same time, the strong imbalance of the classes in the training sample was taken into account, as well as the difference in transaction amounts, and the ability of different machine learning methods to recognize fraudulent behavior was assessed taking into account these features. It has been found that a method that works well with indicators for assessing a classification is not necessarily the best in terms of assessing the magnitude of economic losses. Logistic regression is a good proof of this.
The results of this work show that the problem of detecting fraud with bank cards cannot be regarded as a simple classification problem. AUC data is not the most appropriate metric for fraud detection tasks. The final choice of the model depends on the needs of the bank, that is, it is necessary to take into account which of the two types of errors (FN, FP) will lead to large economic losses for the bank. If the bank believes that the loss caused by identifying fraudulent transactions as regular transactions is the main one, it should choose the algorithm with the lowest FN value, which in this experiment is Adaboost. If the bank believes that the negative impact of identifying regular transactions as fraudulent is also very important, it should choose an algorithm with relatively small FN and FP data. In this experiment, the overall performance of the random forest is better. Further, by evaluating the economic losses caused by false positives (identifying an ordinary transaction as fraudulent), a quantitative analysis of the economic losses caused by each algorithm can be used to select the optimal algorithm model.
Berezkin D.V., Shi Runfang, Li Tengjiao. Applying and comparing multiple machine learning techniques to detect fraudulent credit card transactions. Dynamics of complex systems. 2021. T. 15. № 2. Р. 5−13. DOI: 10.18127/j19997493-202102-01 (in Russian)
- Rafael Roncancio.World Payments Report 2020. Capgemini[EB/OL].https://www.capgemini.com,2020-10.
- Kokkinaki A I. On atypical database transactions: Identification of probable frauds using machine learning for user profiling[C]. Proceedings of the IEEE Knowledge & Data Engineering Exchange Workshop, KDEX. New York: IEEE. 1997. P. 107–113.
- Srivastava A., Kundu A., Sural S. et al. Credit card fraud detection using hidden Markov model[J]. IEEE Transactions on dependable and secure computing. 2008. V. 5. № 1. P. 37–48.
- Sahin Y., Bulkan S., Duman E. A cost-sensitive decision tree approach for fraud detection[J]. Expert Systems with Applications, 2013. V. 40 № 15. P. 5916–5923.
- Salazar A., Safont G., Vergara L. Semi-Supervised Learning For Imbalanced Classification Of Credit Card Transaction [A]. 2018 International Joint Conference on Neural Networks (IJCNN) [C]. IEEE. 2018. P. 1–7.
- Zareapoor M., Shamsolmoali P. Application of credit card fraud detection: Based on bagging ensemble classifier [J]. Procedia Computer Science. 2015. V. 48. P. 679–685.
- Breiman L. Random forests[J]. Machine learning. 2001. V. 45. № 1. P. 5–32.
- Van Vlasselaer V., Bravo C., Caelen O. et al. APATE: A novel approach for automated credit card transaction fraud detection using network-based extensions[J]. Decision Support Systems. 2015. № 75. P. 38–48.
- Machine Learning Group – ULB.Credit Card Fraud Detection Anonymized credit card transactions labeled as fraudulent or genuine[EB/OL].https://www.kaggle.com/mlg-ulb/creditcardfraud,2018.
- Pan Guangwei. Blue Book of China's Banking Industry Development (2017)[M].China Finance Press:Beijing. 2017. P. 3–5.
- Berezkin D.V., Rozhnev A.YU. Razrabotka modeli ocenki platezhesposobnosti klientov banka s primeneniem algoritmov mashinnogo obucheniya // Dinamika slozhnyh sistem–XXI vek. 2018. № 4 S. 59–66 (in Russian).