A.N. Chesalin1
1 MIREA - Russian Technological University (Moscow, Russia)
The state of the problem. Modern information and analytical systems process huge arrays of heterogeneous rapidly changing information, which are commonly called big data, which requires huge computing power and optimized data processing algorithms. In the tasks of classifying objects with heterogeneous information, it is advisable to use models based on modern implementations of gradient boosting and bootstrap aggregation, due to the high quality of their forecasts. At the same time, when processing big data in real time, it is important not only the quality of classification, but also the speed of its execution.
The paper considers the problem of improving big data processing systems in real time on the example of anomaly detection systems and suggests using cascade classifiers, traditionally used in computer vision, to increase the speed of detecting network anomalies. Cascade classifiers allow, given the probabilities of classification errors, to make a decision about the presence/absence of an anomaly using not all the available information signs and classifiers, but only the part necessary to achieve the required classification quality. The use of cascades is justified when it is necessary to obtain results of the required quality with significant time constraints for performing the detection operation.
Purpose. The problem of improving algorithms for detecting network anomalies is investigated and the use of cascade classification algorithms, traditionally used in computer vision problems, is proposed.
Methods. The methods of machine learning, applied mathematical statistics and statistical modeling were used.
Results. The study of the most effective algorithms for constructing cascades (Attentional cascade, Boosted Chain, Soft Cascade, WaldBoost, Direct-backward pruning, Entropy-driven evaluation) for the task of detecting network anomalies is carried out, the pseudocode of their implementation is given and their advantages and disadvantages are considered. The prospects of using cascades are noted not only in image recognition tasks, but also in real-time big data processing tasks. A comparison of the effectiveness of the studied algorithms on different data sets is carried out. The features of using cascades of classifiers in the problem of anomaly detection are considered and the prospects of using the studied algorithms in real-time big data processing tasks, namely in network anomaly detection systems and edge technologies, are noted.
Conclusions. The conducted research shows that cascade algorithms can be effectively used in real-time tasks of processing heterogeneous data (big data), for example, in network anomaly detection systems and edge technologies to increase the speed (efficiency) of processing.
Chesalin A.N. Application of cascading classification algorithms to improve intrusion detection systems. Nonlinear World. 2022. V. 20. № 1. P. 24-41. DOI: https://doi.org/10.18127/j20700970-202201-03 (In Russian)
- Prokhorenkova L. et al. CatBoost: unbiased boosting with categorical features. Advances in Neural Information Processing Systems. 2018. Р. 6638-6648.
- Tan Lu et al. Comparison of YOLO v3, Faster R-CNN, and SSD for Real-Time Pill Identification. BMC Medical Informatics and Decision Making. 2021. Р. 1-28. https://doi.org/10.21203/rs.3.rs-668895/v1.
- Zhang Dongyang et al. Large factor image super-resolution with cascaded Convolutional Neural networks. IEEE Transactions on Multimedia. 2021. Р. 1-9. https://doi.org/10.1109/TMM.2020.3008041.
- Han Wang. Yali Li, Shengjin Wang. Fast Pedestrian Detection with Attention-Enhanced Multi-Scale RPN and Soft-Cascaded Decision Trees. IEEE Transactions on Intelligent Transportation Systems. 2019. № 99. Р. 1-8. https://doi.org/10.1109/TITS.2019.2948398.
- Muthupriya Vasudevan et. al. Customer churn analysis using XGBoosted decision trees. Indonesian Journal of Electrical Engineering and Computer Science. 2022. V. 25. № 1. Р. 488-495. https://doi.org/ 10.11591/ijeecs.v25.i1.pp488-495.
- Viola P., Jones M. Robust Real-Time Face Detection. Second international workshop on statistical and computational theories of vision – modeling, learning, computing, and sampling. 2001. Р. 1–25.
- NSL‐KDD Data Set. [Jelektronnyj resurs]. URL: http://nsl.cs.unb.ca/NSL‐KDD (data obrashhenija: 23.11.2021).
- Freund Y., Schapire R. A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. Journal of computer and system sciences.1997. № 55. Р. 119-139.
- Xiao Rong, Zhu Long. Boosting chain learning for object detection. Proceedings of IEEE Int. Conf. Computer Vision. 2003. № 1. Р. 709-715. https://doi.org/10.1109/ICCV.2003.1238417.
- Sochman J., Matas J. Waldboost - Learning for Time Constrained Sequential Detection. CVPR. 2005. Р. 150–157.
- Wald A. Sequential Analysis. NY: John Wiley and Sons. 1947. 212 р.
- Chesalin A., Grodzenskiy S., Nilov M., Agafonov A. Modification of the WaldBoost algorithm to improve the efficiency of solving pattern recognition problems in real-time. Rossiiskii tekhnologicheskii zhurnal = Russian Technological Journal. 2019. V. 7.
№ 5. Р. 20-29 (in Russ.). https://doi.org/10.32362/2500-316X-2019-7-5-20-29. - Bourdev L., Brandt J. Robust Object Detection via Soft Cascade. CVPR. 2005. Р. 236–243.
- Zhang C., Viola P. Multiple-Instance Pruning for Learning Efficient Cascade Detectors. NIPS. 2007. Р. 1-7.
- Sznitman R. et al. Fast Object Detection with Entropy-Driven Evaluation. CVPR. 2013. Р. 3268-3275. https://doi.org/10.1109/ CVPR.2013.420.
- Pedregosa et al. Scikit-learn: Machine Learning in Python. JMLR 12. 2011. Р. 2825-2830.
- Ruiz-Vanoye et al. Edge computing–Foundations and applications. AI, Edge and IoT-based Smart Agriculture. 2022. Р. 17-30. https://doi.org/10.1016/B978-0-12-823694-9.00017-7.