Using machine learning methods to detect depression among users of the social network Reddit

350 rub

Journal Neurocomputers №5 for 2024 г.

Article in number:

Type of article: scientific article

DOI: 10.18127/j19998554-202405-05

UDC: 21.314.21+004.896

Keywords: Data analysis mental disorders sentiment analysis machine learning support vector machine naive Bayes classifier lo-gistic regression algorithm multilayer perceptron LSTM neural network BERT neural network

Authors:

V.G. Lyalikova1, М.М. Bezryadin2, D.Yu. Golovanov3

1–3 Voronezh state university (Voronezh, Russia)

1 vikalg@yandex.ru, 2 maickel@yandex.ru, 3 dmitry.golovanov.1988@gmail.com

Abstract:

Diagnosing depression is a complex task, the successful solution of which can be influenced both by the lack of knowledge and experience of the psychologist, and by the presence of contradictory or incomplete initial data from the patient. To eliminate the last drawback, expert or intelligent systems are being developed. The goal of the study was to develop a technique based on machine learning algorithms to identify depression among users of the social network Reddit. This problem is considered as a task of analyzing the emotional coloring of a text into two tones - positive (the user’s normal state) and negative (the user is depressed). To solve the problem, the process of data preprocessing is analyzed, including data cleaning, tokenization, removal of stop words, lemmatization, vectorization. The work of such classical machine learning algorithms as the naive Bayes classifier, logistic regression algorithm, support vector machine, as well as neural network algorithms – multilayer perceptron, LSTM and BERT neural network is considered. A hypothesis is put forward about the possibility of ensuring high accuracy through the use of neural network algorithms. The models were developed in Python using the nltk, sklearn and keras, tensorflow, transformers libraries. The results of a computer experiment are presented. A comparative analysis of the performance quality of the considered algorithms was carried out using the metrics of completeness, accuracy and F1-measure. As a result, the accuracy rates for determining the emotional coloring of comments on Reddit for the LSTM and BERT neural networks reached 97% and 98%, respectively.

Pages: 49-56

For citation

Lyalikova V.G., Bezryadin М.М., Golovanov D.Yu. Using machine learning methods to detect depression among users of the social network Reddit. Neurocomputers. 2024. V. 26. № 5. Р. 49-56. DOI: https://doi.org/10.18127/j19998554-202405-05 (In Russian)

References

De Choudhury M., Gamon M., Counts S., Horvitz E. Predicting depression via social media. Proceedings of the International AAAI Conference on Web and Social Media. 2013. V. 7. № 1. P. 128–137. DOI 10.1609/icwsm.v7i1.14432.
Gkotsis G., Oellrich A., Velupillai S., Liakata M., Hubbard T.J.P., Dobson R.J.B., Dutta R. Characterisation of mental health conditions in social media using Informed Deep Learning. Scientific Reports. 2017. V. 7. P. 45141. DOI 10.1038/srep45141.
Kim J., Lee J., Park E., Han J. A deep learning model for detecting mental illness from user content on social media. Scientific Reports. 2020. V. 10. P. 11846. DOI 10.1038/s41598-020-68764-y.
Liu Y., Ott M., Goyal N., Du J., Joshi M., Chen D., Levy O., Lewis M., Zettlemoyer L., Stoyanov V. Roberta: A robustly optimized BERT pretraining approach. [Electronic resource] – Access mode: https://arxiv.org/pdf/1907.11692, date of reference 05.07.2023.
Murarka A., Balaji R., Sushma R. Detection and classification of mental illnesses on social media using RoBERTa. [Electronic resource] – Access mode: https://arxiv.org/pdf/2011.11226.pdf, date of reference 05.06.2023.
Reece A.G., Christopher M.D. Instagram photos reveal predictive markers of depression. EPJ Data Science. 2017. V. 6. № 1. P. 1–12. DOI 10.1140/epjds/s13688-017-0110-z.
Tsugawa Sh., Kikuchi Y., Kishino F., Nakajima K., Itoh Y., Ohsaki H. Recognizing depression from twitter activity. Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. 2015. P. 3187–3196. DOI 10.1145/2702123.2702280.
Xue Y., Li Q., Jin L., Feng L., Clifton D.A., Clifford G.D. Detecting Adolescent Psychological Pressures from Micro-Blog. Health Information Science. 2014. V. 8423. P. 83–94. DOI 10.1007/978-3-319-06269-3_10.
Lyalikova V.G., Bezryadin M.M. Analysis of the tonality of text by machine learning methods. Proceedings of the International Scientific Conference "Actual problems of applied mathematics, computer science and mechanics". Voronezh: Scientific research publications. 2023. P. 475–478. (In Russian)
McMahan B. Getting to know PyTorch: deep learning in natural language processing. St. Petersburg: Peter. 2020. 258 p. (In Russian)
Bengfort B. Applied analysis of text data in Python. Machine learning and the creation of natural language processing applications. St. Petersburg: Peter. 2019. 368 p. (In Russian)
Faustova K.I. Neural networks: application today and development prospects. The territory of science. 2017. № 3. P. 89–91. (In Russian)
Scholle F. Deep learning in Python. St. Petersburg: Peter. 2018. 400 p. (In Russian)
Dobrina M.V. Methods of data analysis using artificial neural networks. Neurocomputers. 2023. V. 25. № 4. Р. 45–53. DOI 10.18127/ j19998554-202304-06. (In Russian)
Wolf T., Debut L., Sanh V., Chaumond J., Delangue C., Moi A., Cistac P., Rault T., Louf R., Funtowicz M., Davison J., Shleifer S., von Platen P., Ma C., Jernite Y., Plu J., Xu C., Le Scao T., Gugger S., Drame M., Lhoest Q., Rush A. Transformers: State-of-the-Art Natural Language Processing. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. 2020. P. 38–45.DOI 10.18653/v1/2020.emnlp-demos.6.

Date of receipt: 01.09.2024

Approved after review: 15.09.2024

Accepted for publication: 26.09.2024