350 rub
Journal Neurocomputers №4 for 2022 г.
Article in number:
Exploratory data analysis of dataset with bank card transactions
Type of article: scientific article
DOI: https://doi.org/10.18127/j19998554-202204-03
UDC: 004.048
Authors:

G.S. Ivanova1, Y.O. Fokina2

1,2 Bauman Moscow State Technical University (Moscow, Russia)

Abstract:

Problem setting. Most existing antifraud systems are based on the use of rules rather than machine learning algorithms, which significantly reduces their fraud detection quality. To create an accurate and efficient machine learning model, a preliminary analysis of the raw data is required. This paper focuses on conducting exploratory data analysis (EDA) of a dataset with customer bank card transactions. Due to the fact that the dataset under consideration has been compressed, exploratory analysis will allow to find in it the relationships between the variables, which due to the compression of the dataset were not obvious.

Target. Performing exploratory analysis of a dataset with information about users' card transactions.

Results. The typical sequence of conducting the exploratory analysis of the dataset is considered. On the basis of the described steps the EDA of the dataset, containing the information about bank card transactions of bank clients was carried out, the big imbalance of classes of fraudulent and legitimate transactions was revealed and the variables which are necessary for the model training were singled out.

Practical significance. The results of the work can be used to develop a fraud-monitoring system based on machine learning algorithms that use the described set of publicly available data. The analysis of the set will allow to use the data available for training more efficiently and to select the best possible model.

Pages: 28-38
For citation

Ivanova G.S., Fokina Y.O. Exploratory data analysis of dataset with bank card transactions. Neurocomputers. 2022. V. 24. № 4.
Р. 28-38. DOI: https://doi.org/10.18127/j19998554-202204-03 (in Russian)

References
  1. Aleksandrov V.V., Ponomarenko S.V., Birjukov M.V. Predotvrashhenie moshennicheskih dejstvij po bankovskim kartam s po-moshh'ju sistem frod-monitoringa. Vestnik Belgorodskogo universiteta kooperacii, jekonomiki i prava. 2017. S. 225–233 (In Russian).
  2. Lukmanova K.A. Analiz sistem monitoringa tranzakcij. XIV Vseross. molodezhnaja nauch. konf. «Mavljutovskie chtenija». Ufa: UGATU. 2020. № 5.3.24 (In Russian).
  3. Ivanova G.S., Golovkov A.A., Lonshakova K.A. Analiz metodov predobrabotki rentgenovskih snimkov. Tehnologii inzhenernyh i informacionnyh sistem. 2018. № 3. S. 79-85. EDN YLAFOX (In Russian).
  4. Vlasov A.I., Papulin S.Ju. Analiz dannyh s ispol'zovaniem gistogrammnoj modeli kombinacii priznakov. Nejrokomp'jutery: razrabotka, primenenie. 2019. T. 21. № 5. S. 18-27. DOI: 10.18127/j19998554-201905-02 (In Russian).
  5. Carl Anderson. Creating a Data-Driven Organization. O`Reily Media. 2015. R. 83-86 (In Russian).
  6. Exploratory Data Analysis. [Jelektronnyj resurs]. URL: https://www.ibm.com/cloud/learn/exploratory-data-analysis (data obrashhenija 10.05.2022) (In Russian).
  7. Nabor dannyh «Credit Card Fraud Detection». [Jelektronnyj resurs]. URL: https://www.kaggle.com/mlg-ulb/creditcardfraud (data obrashhenija 10.05.2022) (In Russian).
  8. Analiz glavnyh komponent (RSA). [Jelektronnyj resurs]. URL: https://www.helenkapatsa.ru/mietod-ghlavnykh-komponient/ (data obrashhenija 04.10.2021) (In Russian).
  9. Issledovatel'skij analiz dannyh: prakticheskoe rukovodstvo i shablon dlja strukturirovannyh dannyh. [Jelektronnyj resurs]. URL: https://www.machinelearningmastery.ru/exploratory-data-analysis-eda-a-practical-guide-and-template-for-structu-red-data-abfbf3ee3bd9 (data obrashhenija 10.05.2022) (In Russian).
  10. EDA with Pandas. [Jelektronnyj resurs]. URL: https://www.kaggle.com/code/emstrakhov/eda-with-pandas/notebook (data obrashhenija 10.05.2022) (In Russian).
  11. Data Mining (Metody dobychi dannyh). [Jelektronnyj resurs]. URL: http://statsoft.ru/home/textbook/modules/stdat-min.html#eda (data obrashhenija 10.05.2022) (In Russian).
  12. Exploratory Data Analysis: Functions, Types & Tools. [Jelektronnyj resurs]. URL: https://analyticsindiamag.com/exploratory-data-analysis-functions-types-tools/ (data obrashhenija 10.05.2022). (In Russian).
  13. Exploratory data analysis. [Jelektronnyj resurs]. URL: https://datascienceguide.github.io/exploratory-data-analysis (data obrashhenija 10.05.2022) (In Russian).
  14. Kak nagljadno pokazat' Data Science: vizualizacija bol'shih dannyh. [Jelektronnyj resurs]. URL: https://cherno-brovov.ru/articles/kak-naglyadno-pokazat-data-science-vizualizaciya-bolshih-dannyh.html (data obrashhenija 10.05.2022) (In Russian).
  15. Chokoj V.Z. Obrabotka i razvedochnyj analiz chislovyh massivov dannyh. Crede Experto: transport, obshhestvo, obrazovanie, jazyk. 2017. № 3. URL: https://cyberleninka.ru/article/n/obrabotka-i-razvedochnyy-analiz-chislovyh-massivov-dannyh (data obrashhenija: 17.05.2022) (In Russian).
  16. Detecting Credit Card Fraud Using Machine Learning. [Jelektronnyj resurs]. URL: https://towardsdatascience.com/detecting-credit-card-fraud-using-machine-learning-a3d83423d3b8 (data obrashhenija 10.05.2022) (In Russian).
Date of receipt: 17.05.2022
Approved after review: 31.05.2022
Accepted for publication: 23.06.2022