N.V. Gridina – Junior Research Scientist, Assistant of Department,
G.V. Zolotenkova – Ph.D.(Med.), Senior Research Scientist, Associate Professor of Department,
A.I. Rogachev – Research Engineer,
Yu.I. Pigolkin – Corresponding Member, Dr.Sc.(Med.), Professor, Main Research Scientist, Head of Department.
Currently, there is a need to integrate machine learning methods into forensic expert practice for a full analysis of individual parameters, accounting for their relationships, and identifying hidden patterns of heterogeneous features that characterize the age-related transformation of various human organs and tissues. This article describes the sequence of actions and tools that were used to implement the next stage of the project to develop a specialized complex for assessing biological age during forensic medical examination to identify the identity of an unidentified object. The goal at this stage was to assess the accuracy and reliability of the expected results using the developed mathematical model of age group classification. The specific challenge of this task is the need of training models in the context a small amount of source data. In this case, the feature selection process and control over the overfitting is extremely important. In the course of the work, the following pipeline was implemented: feature engineering based on two-dimensional visualization, followed by information content analysis based on algorithms using decision trees and the Gini information criterion for features ranking; research of the dependence of model quality metrics on the number of features used; comparative analysis of classification algorithms and selecting the optimum within the framework of the considered problem in the context of the available data, the exploration of the confusion matrix constructed for the optimal algorithm, in which was considered and studied ROC-AUC for each of the classes based on the training of binary classifiers in the number of different values of the class labels. To solve these problems, we used libraries written in Python, as well as Jupyter Notebook as an interactive environment.
Tools used: sklearn, catboost, uMAP, Seaborn, matplotlib, Bokeh Python libraries; Jupyter Notebook as an interactive environment. The experimental verification proved the effectiveness of the described pipeline and the tool kit used in the work for solving problems of classification of the age group, including the process of working with data of different nature and volume. The most important stage is the process of feature engineering and applying all possible regularization methods to avoid overfitting, which is most relevant in the context of a small amount of data. The optimal approach is to use libraries implemented in Python. The use of Catboost or Random Forest algorithms provides an ability of parallel computing.
Gridina N.V., Zolotenkova G.V., Rogachev A.I., Pigolkin Yu.I. Instrumental support for solving the classification problem in forensic diagnostics of the biological age of an unidentified individual. Highly Available Systems. 2020. V. 16. № 4. P. 64−70. DOI: 10.18127/j20729472-202004-06. (In Russian).
- Pigolkin Yu.I., Zolotenkova G.V., Berezovskii D.P. Metodologicheskie osnovy opredeleniya vozrasta cheloveka. Sudebno-meditsinskaya ekspertiza. 2020. T. 63. № 2. S. 58−63. (In Russian).
- Zolotenkova G.V., Gridina N.V., Solodovnikov V.I., Trufanov M.I., Pigolkin Yu.I. Vychislenie biologicheskogo vozrasta individuuma s ispolzovaniem noveishikh informatsionnykh tekhnologii i postroenie perspektivnogo intellektualnogo programmno-apparatnogo kompleksa. Sudebno-meditsinskaya ekspertiza. 2019. T. 62. № 3. S. 42−47. (In Russian).
- Kimmerle E.H., Jantz R.L., Konigsberg L.W., Baraybar J.P. Skeletal estimation and identification in American and East European populations. Journal of Forensic Sciences. 2008. V. 53. № 3. P. 524−532.
- Ferrante L., Skrami E., Gesuita R., Cameriere R. Bayesian calibration for forensic age estimation. Statistics in Medicine. 2015. V. 34. № 10. P. 1779−1790.
- Gridina N.V., Zolotenkova G.V., Rogachev A.I. Ispolzovanie klassifikatorov dlya tselei sudebno-meditsinskoi identifikatsii lichnosti (diagnostiki vozrasta). Biomeditsinskaya radioelektronika. 2019. T. 22. № 5. S. 38−44. (In Russian).
- Zolotenkova G.V., Gridina N.V., Solodovnikov V.I. Algoritm vychisleniya biologicheskogo vozrasta individuuma s ispolzovaniem noveishikh informatsionnykh tekhnologii. V sb. «Informatsionnye tekhnologii i matematicheskoe modelirovanie sistem». 2018. S. 151−154. (In Russian).
- Budzko V.I., Shmid A.V. Problemy tsifrovoi transformatsii zdravookhraneniya. Sistemy vysokoi dostupnosti. 2019. T. 15. № 3. S. 5−26. (In Russian).
- Gridin V.N., Yakhno N.N., Trufanov M.I., Vinogradov V.A. Razrabotka programmnogo obespecheniya dlya obrabotki magnitnorezonansnykh izobrazhenii pri diagnostike bolezni Altsgeimera. Sistemy vysokoi dostupnosti. 2019. T. 15. № 3. S. 70−78. (In Russian).
- Moskalenko V.A., Nikolskii A.V., Zolotykh N.Yu., Kozlov A.A., Kosonogov K.A., Kalyakulina A.I., Yusipov I.I., Levanov V.M. Programmnyi kompleks «kiberserdtse-diagnostika» dlya avtomaticheskogo analiza elektrokardiogramm s primeneniem metodov mashinnogo obucheniya. Sovremennye tekhnologii v meditsine. 2019. T. 11. № 2. S. 86−91. (In Russian).
- Samoyavcheva S.V., Shkarin V.V. Vozmozhnosti klasternogo analiza v interpretatsii dannykh sutochnogo monitorirovaniya arterialnogo davleniya u bolnykh arterialnoi gipertoniei i remodelirovaniem levogo zheludochka. Sovremennye tekhnologii v meditsine. 2015. T. 7. № 4. S. 113−118. (In Russian).