350 rub
Journal Neurocomputers №2 for 2020 г.
Article in number:
Monitoring computer systems for anomaly detection by analysis of log files using machine learning techniques
Type of article: scientific article
DOI: 10.18127/j19998554-202002-05
UDC: 621.396, 681.3
Authors:

O.I. Sheluhin – Dr. Sc. (Eng.), Professor, Head of the Department of «Information Security», Moscow technical university of communications and informatics,

E-mail: sheluhin@mail.ru

D.V. Kostin – Post-graduate Student, Department of «Information Security», Moscow technical university of communications and informatics,

E-mail: d.v.kostin@mail.ru

Abstract:

In this paper we analyzed methods and algorithms for automated monitoring of a computer system by analysis of log files using machine learning approach for detecting anomalies and diagnosing abnormal system states. This allows operators to localize and understand the cause of the problem. The huge volume of logs makes it impractical to manually inspect log messages for key diagnostic information. Machine learning algorithms are used to help developers extract the necessary information from log files. We validate our approach on a 6-node production cluster managed by Rancher. We manually identified events that significantly affect the state of the system and are abnormal. These events are related to network traffic problems or denial of service. 81 message patterns are marked as abnormal from a total of 988 patterns.

An anomaly is an instance in the data set that does not correspond to the regular behavior of the system. Anomaly detection system consists of four stages: raw data extraction, log messages processing, feature extraction and normalization, anomaly detection model building. First, logs are collected with records of system states and information about running processes. Second, a log parser is used to retrieve event templates. As a result, the raw logs become structured.

We showed that the Drain algorithm based on a heuristic approach using a syntax tree has the best accuracy for all data sets. After log parsing, the extracted message templates are converted into numeric feature vectors. The collection of all vectors is a feature matrix. In the fourth stage, the feature matrix is used for binary classification using machine learning algorithms to determine the log sequence is abnormal or not. The following machine learning algorithms were used: k-Nearest Neighbors; Logistic Regression; Gaussian Naive Bayes; Decision Tree Classifier; Random Forest; Gradient Boosting.

We evaluated machine learning algorithms on the collected data using K-fold cross-validation. The paper shows that the best results (ROC AUC value) for binary classification of anomalous events were obtained by the following machine learning algorithms: Logistic Regression (0,98) and Random Forest (0,90) and Gradient Boosting (0,89).

Pages: 53-65
For citation

Sheluhin O.V., Kostin D.V. Monitoring computer systems for anomaly detection by analysis of log files using machine learning techniques. Neurocomputers. 2020. V. 22. № 2. P. 53–65.  DOI: 10.18127/j19998554-202002-05

References
  1. Shelukhin O.I., Ryabinin V.S., Farmakovskiy M.A. Obnaruzheniye anomalnykh sostoyaniy kompyuternykh sistem sredstvami intellektualnogo analiza dannykh sistemnykh zhurnalov. Voprosy kiberbezopasnosti. 2018. №2(26). DOI: 10.21681/2311-34562018-2-33-43 (in Russian).
  2. Zhu J., He S., Liu J., He P., Xie Q., Zheng Z., Lyu M.R. Tools and Benchmarks for Automated Log Parsing. To appear in International Conference on Software Engineering (ICSE). 2019.
  3. Vaarandi R. A data clustering algorithm for mining patterns from event logs. IPOM. 2003.
  4. Vaarandi R., Pihelgas M. Logcluster – a data clustering and pattern mining algorithm for event logs. CNSM. 2015. P. 1–7.
  5. Fu Q., Lou J.-G., Wang Y., Li J. Execution anomaly detection in distributed systems through unstructured log analysis. ICDM.2009. P. 149–158.
  6. Tang L., Li T., Perng C.-S. LogSig: Generating system events from raw textual logs. CIKM. 2011. P. 785–794.
  7. Hamooni H., Debnath B., Xu J., Zhang H., Jiang G., Mueen A. LogMine: fast pattern recognition for log analytics. CIKM. 2016. P. 1573–1582.
  8. Mizutani M. Incremental mining of system log format. SCC. 2013. P. 595–602.
  9. Shima K. Length matters: Clustering system log messages using length of words. arXiv:1611.03213. 2016.
  10. Jiang Z.M., Hassan A.E., Flora P., Hamann G. Abstracting execution logs to execution events for enterprise applications. QSIC. 2008. P. 181–186.
  11. Makanju A., Zincir-Heywood A., Milios E. Clustering event logs using iterative partitioning. KDD. 2009.
  12. He P., Zhu J., Zheng Z., Lyu M.R. Drain: An online log parsing approach with fixed depth tree. ICWS. 2017. P. 33–40.
  13. Du M., Li F. Spell: Streaming parsing of system event logs. ICDM. 2016. P. 859–864.
  14. Xu W., Huang L., Fox A., Patterson D.A., Jordan M.I. Detecting large-scale system problems by mining console logs. SOSP. 2009. P. 117–132.
  15. Lou J., Fu Q., Yang S., Xu Y., Li J. Mining invariants from console logs for system problem detection. ATC. 2010.
  16. Du M., Li F., Zheng G., Srikumar V. Deeplog: Anomaly detection and diagnosis from system logs through deep learning. CCS. 2017. P. 1285–1298.
  17. Nagaraj K., Killian C.E., Neville J. Structured comparative analysis of systems logs to diagnose performance problems. NSDI.2012. P. 353–366.
  18. Automated root cause analysis for spark application failures [Elektronnyy resurs]. – URL: https://www.oreilly.com/ideas/automated-root-cause-analysis-for-spark-application-failures
  19. Du M., Li F., Zheng G., Srikumar V. Deeplog: Anomaly detection and diagnosis from system logs through deep learning. CCS. 2017. R. 1285–1298.
  20. Xu W., Huang L., Fox A., Patterson D., Jordan M.I. Detecting large-scale system problems by mining console logs. Proc. ACM Symposium on Operating Systems Principles (SOSP). 2009. P. 117–132.
  21. Lou J.-G., Fu Q., Yang S., Li J., Wu B. Mining program workflow from interleaved traces. Proc. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD). 2010.
  22. Yu X., Joshi P., Xu J., Jin G., Zhang H., Jiang G. CloudSeer: Workflow Monitoring of Cloud Infrastructures via Interleaved Logs. Proc. ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 2016. P. 489–502.
  23. He S., Zhu J., He P., Lyu M.R. Experience Report: System Log Analysis for Anomaly Detection. IEEE 27th International Symposium on Software Reliability Engineering. 2016. P. 207-218.
  24. Shelukhin O.I., Erokhin S.D., Vanyushina A.V. Klassifikatsiya IP –trafika metodami mashinnogo obucheniya / pod red. O.I. Shelukhina. M.: Goryachaya liniya – Telekom. 2018. 284 s. (in Russian).
  25. Messaoudi S., Panichella A., Bianculli D., Briand L., Sasnauskas R. A search-based approach for accurate identification of log message formats. ICPC. 2018.
  26. Android [Elektronnyy resurs]. – URL: https://www.android.com/
  27. Apache Hadoop [Elektronnyy resurs]. – URL: http://hadoop.apache.org/
  28. Linux [Elektronnyy resurs]. – URL: https://www.linux.org/
  29. OpenSSH [Elektronnyy resurs]. – URL: https://www.openssh.com/
  30. Apache Spark [Elektronnyy resurs]. – URL: http://spark.apache.org/
Date of receipt: 16 января 2020 г.