O.I. Sheluhin – Dr. Sc. (Eng.), Professor, Head of the Department of «Information Security», Moscow technical university of communications and informatics, E-mail: sheluhin@mail.ru
D.V. Kostin – Post-graduate Student,
Department of «Information Security»,
Moscow technical university of communications and informatics, E-mail: d.v.kostin@mail.ru
The huge volume of logs requires the development of automated methods for processing unstructured data, because it impractical to manually inspect log messages for key diagnostic information. Anomaly events can have various causes in real computer systems. Thus, the problem of identifying the type of anomaly should use the methods of multi-class classification. Since the number of anomalies is usually small, we use both abnormal and normal events to train the machine learning algorithm. In this paper, we introduced a two-stage scheme for identifying the type of anomaly. First, the binary classification problem is solved using two classes: 1 – «anomaly», 0 – «normal state». Second, a specific type of anomaly is determined. The results of evaluating a two-stage classification scheme using cross-validation have shown that the Logistic Regression and Random Forest algorithms have the best performance. These algorithms also showed better accuracy for the binary classification problem for detecting anomaly states. We also showed that such algorithms do not work well for determining the types of mixed events.
We have improved the accuracy of identification the type of anomaly when using the «name of the logging component». This attribute is available on many computer systems. The paper shows that this allows to increase the accuracy by 10%. The greatest increase in accuracy was observed for k-nearest neighbors and Logistic Regression.
Sheluhin O.I., Kostin D.V. Classification of anomalous states of computer systems by means of intellectual analysis of system journals. Neurocomputers. 2020. V. 22. № 1. P. 66–76. DOI: 10.18127/j19998554-202001-07.
- Shelukhin O.I., Ryabinin V.S., Farmakovskiy M.A. Obnaruzheniye anomalnykh sostoyaniy kompyuternykh sistem sredstvami intellektualnogo analiza dannykh sistemnykh zhurnalov. Voprosy kiberbezopasnosti. 2018. T. 26. № 2. S.33-43. DOI: 10.21681/2311-3456-2018-2-33-43 (in Russian).
- Shelukhin O.I., Ryabinin V.S. Obnaruzheniye anomaliy bolshikh dannykh nestrukturirovannykh sistemnykh zhurnalov. Voprosy kiberbezopasnosti. 2019. T. 30. № 2. S. 36-41. DOI 10.21681/2311-3456-2019-2-36-41 (in Russian).
- Manning C.D., Raghavan P., Schutze H. Scoring. term weighting. and the vector space model. Introduction to Information Retrieval. 2008. p. 100.
- Jones K.S. A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation. 2004. V. 60. № 5. P. 493-502.
- Shelukhin O.I., Erokhin S.D., Vanyushina A.V. Klassifikatsiya IP–trafika metodami mashinnogo obucheniya. Pod red. O.I. Shelukhina. M.: Goryachaya-liniya – Telekom. 2018. 284 s. (in Russian).
- Bishop C.M. Pattern Recognition and Machine Learning. Springer. 2006.
- Zwietasch T. Detecting Anomalies in System Log Files using Machine Learning Techniques. University of Stuttgart. 2014. http://dx.doi.org/10.18419/opus-3454
- He P., Zhu J., He S., Li J., Lyu M.R. An evaluation study on log parsing and its use in log mining. Proc. of the 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks. 2016. P.654-661. DOI 10.1109/DSN.2016.66
- Kazhemskiy M.A., Shelukhin O.I. Mnogoklassovaya klassifikatsiya setevykh atak na informatsionnyye resursy metodami mashinnogo obucheniya. Trudy uchebnykh zavedeniy svyazi. 2019. T. 5. № 1. S. 107–115. DOI 10:31854/1813-324X-2019-5-1-107-115 (in Russian).
- Ting K.M. Encyclopedia of machine learning. Springer. 2011
- Harris D., Harris S. Digital design and computer architecture. 2nd ed. San Francisco. Calif.: Morgan Kaufmann. 2007. p. 129.