V. N. Kiroy – Dr.Sc. (Biol.), Professor, Head of Scientific Research Technological Center of Neurotechnologies, Southern Federal University (Rostov-on-Don, Russia)
E-mail: kiroy@sfedu.ru
O. M. Bakhtin – Ph.D. (Biol.), Senior Research Scientist, Scientific Research Technological Center of Neurotechnologies, Southern Federal University (Rostov-on-Don, Russia)
I. E. Shepelev – Ph.D. (Eng.), Leading Research Scientist, Scientific Research Technological Center of Neurotechnologies, Southern Federal University (Rostov-on-Don, Russia)
D. G. Shaposhnikov – Ph.D. (Eng.), Leading Research Scientist, Scientific Research Technological Center of Neurotechnologies, Southern Federal University (Rostov-on-Don, Russia)
Well known, speech largely characterizes the internal state of a person, including the emotions he experiences. As a result, elucidation of the relationship of its acoustic and intonational characteristics with experienced emotions is essential for solving a number of applied problems, in particular, an objective assessment of a person's functional state in various fields of activity. The latter is especially important in situations where the inadequacy of the emotions experienced by the human operator can pose a threat to both his life and the lives of people around him. Assessment of the emotional state of a person by intonation of speech is essential for solving applied problems related to ensuring safety when performing operator activities (drivers, pilots, dispatchers, etc.).
The purpose of this study is to develop a neural network classifier model for recognizing weak human emotions based on the characteristics of a speech message. The mel-frequency cepstral coefficients have been used as the acoustic-frequency characteristics of emotions and the input parameters of the classifier. A series of psychophysiological tests have been conducted to assess the level of information content of the scene in the image. With the help of a group of experts on the subject's voice message, the degree of confidence in the description of the scene has been evaluated. Then part of the test results has been used to train the classifier, and the rest – to evaluate its performance.
Analysis of the results of the automatic classification of intonations into “certain” – “uncertain” showed a high level of recognition (up to 70%), which indicates the prospect of using a neural network approach to automatically discriminate weak emotions by their manifestation in speech.
KiroyV.N., Bakhtin O.M., Shepelev I.E., Shaposhnikov D.G. Application of the neural network classifier for monitoring the psycho-emotional state of a person based on speech analysis. Neurocomputers. 2020. Vol. 22. No. 5. P. 30–42. DOI: 10.18127/j19998554-202005-03. (in Russian)
- Morozov V.P. Iskusstvo i nauka obshcheniya: neverbal'naya kommunikatsiya. M.: IP RAN, Tsentr «Iskusstvo i nauka». 1998. (in Russian)
- Hansen J.H.L., Cairns D.A. ICARUS: Source generator based real-time recognition of speech in noisy stressful and Lombard effect environments. Speech Communication. 1995. V. 16. № 4. P. 391–422.
- Schuller B., Rigoll G., Lang M. Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture. Proceedings of the ICASSP 2004. V. 1. P. 577–580.
- El Ayadi M., Kamel M.S., Karray F. Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition. 2011. V. 44. № 3. P. 572–587.
- Scherer K.R. Vocal communication of emotion: a review of research paradigms. Speech Communication. 2003. V. 40. P. 227–256.
- Cowie R., Douglas-Cowie E., Tsapatsoulis N., Votsis G., Kollias S., Fellenz W., Taylor J.G. Emotion recognition in human–computer interaction. IEEE Signal Processing Magazine. 2001. V. 18. № 1. P. 32–80.
- Juslin P., Laukka P. Communication of emotions in vocal expression and music performance: different channels, same code? Psychological Bulletin. 2003. V. 129. P. 770–814.
- Laukka P. Research on vocal expression of emotion: state of the art and future directions. In: K. Izdebski (Ed.). Emotions in the human voice, foundations. 2008. V. 1. Plural Publishing, San Diego. P. 153–169.
- Campbell N. Getting to the heart of the matter: speech as the expression of affect; rather than just text or language. Language. Resources and Evaluation. 2005. V. 39. P. 109–118.
- Cowie R. Perceiving emotion: towards a realistic understanding of the task. Philos. Transactions of the Royal Society B. 2009. V. 364. P. 3515–3525.
- Cowie R., Cornelius R.R. Describing the emotional states that are expressed in speech. Speech Communication. 2003. V. 40. P. 5–32.
- Devillers L., Vidrascu L., Lamel L. Challenges in real-life emotion annotation and machine learning based detection. Neural Networks. 2005. V. 18. P. 407–422.
- Ververidis D., Kotropoulos C. A review of emotional speech databases. Proc. Panhellenic Conference on Informatics (PCI). Thessaloniki, Greece. 2003. P. 560–574.
- Douglas-Cowie E., Campbell N., Cowie R., Roach P. Emotional speech: towards a new generation of databases. Speech Communication. 2003. V. 40. P. 33–60.
- Greasley P., Sherrard C., Waterman M. Emotion in language and speech: methodological issues in naturalistic settings. Language and Speech. 2000. V. 43. P. 355–375.
- Lee C.M., Narayanan S.S. Toward detecting emotions in spoken dialogs. IEEE Transactions on Speech and Audio Processing. 2005. V. 13. № 2. P. 293–303.
- Litman D.J., Forbes-Riley K. Recognizing student emotions and attitudes on the basis of utterances in spoken tutoring dialogues with both human and computer tutors. Speech Communication. 2006. V. 48. P. 559–590.
- Kandali A.B., Routray A., Basu T.K. Vocal emotion recognition in five native languages of Assam using new wavelet features. International Journal of Speech Technology. 2009. V. 12. P. 1–13.
- Schuller B., Rigoll G. Timing levels in segment-based speech emotion recognition. Proceedings of the 9th International Conference on Spoken Language Processing. Pittsburgh, Pennsylvania. September 17–21, 2006. P. 1818–1821.
- Cornelius R.R. The science of emotion: Research and tradition in the psychology of emotions. N.J.: Upper Saddle River; London: Prentice-Hall 1996.
- Batliner A., Fischer K., Huber R., Spilker J., Nöth E. How to find trouble in communication. Speech Communication. 2003. V. 40. P. 117–143.
- Vogt T., André E. Comparing feature sets for acted and spontaneous speech in view of automatic emotion recognition. Proceedings of the IEEE International Conference on Multimedia and Expo. Amsterdam, the Netherlands. 2005. P. 474–477.
- Schuller B., Müller R., Hörnler B., Höthker A., Konosu H., Rigoll G. Audiovisual recognition of spontaneous interest within conversations. Proceedings of the 9th International Conference on Multimodal Interfaces. Nagoya, Japan. 2007. P. 30–37.
- Breazeal C., Aryananda L. Recognition of affective communicative intent in robot-directed speech. Autonomous Robots. 2002. V. 12. P. 83–104.
- Slaney M., McRoberts G. BabyEars: A recognition system for affective vocalizations. Speech Communication. 2003. V. 39. P. 367–384.
- Nikonov A.V. K voprosu o vozmozhnosti nepreryvnoj otsenki emotsional'nogo sostoyaniya cheloveka-operatora vo vremya poleta po ego rechevym soobshcheniyam. Materialy simpoziuma «Rech' i emotsii». 11–14 noyabrya. L.: 1974. (in Russian)
- Ramishvili G.S. Avtomaticheskoe opoznanie govoryashchego po golosu. M.: Radio i svyaz'. 1981. (in Russian)
- Zhabin D.V. Formal'nye priznaki spontannoj rechi govoryashchego v situatsii stressa. Avtoref. diss. … kand. filol. nauk. Voronezh. 2006. (in Russian)
- Laukka P., Neiberg D., Forsell M., Karlsson I., Elenius K. Expression of affect in spontaneous speech: Acoustic correlates and automatic detection of irritation and resignation. Computer Speech & Language 2011. V. 25. № 1. P. 84–104.
- Barlow D.H. Unraveling the mysteries of anxiety and its disorders from the perspective of emotion theory. American Psychologist. 2000. V. 55. P. 1247–1263.
- Herry C., Bach D.R., Esposito F., Di Salle F., Perrig W.J., Scheffler K., Luthi A., Seifritz E. Processing of temporal unpredictability in human and animal amygdale. The Journal of Neuroscience. 2007. V. 27. № 22. P. 5958–5966.
- Carleton R.N., Sharpe D., Asmundson G.J. Anxiety sensitivity and intolerance of uncertainty: requisites of the fundamental fears? Behaviour Research and Therapy. 2007. V. 45. № 10. P. 2307–2316.
- Grupe D.W., Nitschke J.B. Uncertainty and anticipation in anxiety: an integrated neurobiological and psychological perspective. Nature Reviews. Neuroscience. 2013. V. 14. № 7. P. 488–501.
- Davis M., Walker D.L., Miles L., Grillon C. Phasic versus sustained fear in rats and humans: role of the extended amygdala in fear versus anxiety. Neuropsychopharmacology. 2010. V. 35. P. 105–135.
- Walker D.L., Toufexis D.J., Davis M. Role of the bed nucleus of the striaterminalis versus the amygdala in fear, stress, and anxiety. European Journal of Pharmacology. 2003. V. 463. № 1-3. P. 199–216.
- Brühl A.B., Rufer M., Delsignore A., Kaffenberger T., Jäncke L., Herwig U. Neural correlates of altered general emotion processing in social anxiety disorder. Brain Research. 2011. V. 1378. P. 72–83.
- Kiroj V.N., Aslanyan E.V., Bakhtin O.M., Minyaeva N.R., Lazurenko D.M. EEG-korrelyaty funktsional'nogo sostoyaniya pilotov v dinamike trenazhernykh poletov. Zhurnal vysshej nervnoj deyatel'nosti. 2015. T. 65. № 1. S. 1–9. (in Russian)
- Grimm M., Mower E., Kroschel K., Narayanan S. Combining categorical and primitives based emotion recognition. 14th European Signal Processing Conference. Florence, Italy. 4–8 Sept, 2006. P. 1–5.
- Vlasenko B., Schuller B., Wendemuth A., Rigoll G. Frame vs. turn-level: emotion recognition from speech considering static and dynamic processing. Proceedings of Affective Computing and Intelligent Interaction. Lisbon, Portugal. 2007. P. 139–147.
- Bozkurt E., Erzin E., Erdem Ç.E., Erdem A.T. Formant position based weighted spectral features for emotion recognition. Speech Communication. 2011. V. 53. P. 1186–1197.
- Davood G., Sheikhan M., Nazerieh A., Garouc S. Speech emotion recognition using FCBF feature selection method and GAoptimized fuzzy ARTMAP neural network. Neural Computing and Applications. 2012. V. 21. № 8. P. 2115–2126.
- Pérez-Espinosa H., Reyes-García C.A., Villaseñor-Pineda L. Acoustic feature selection and classification of emotions in speech using a 3D continuous emotion model. Biomedical Signal Processing and Control. 2012. V. 7. № 1. P. 79–87.
- Sun Y., Wen G., Wang J. Weighted spectral features based on local Hu moments for speech emotion recognition. Biomedical Signal Processing and Control. 2015. V. 18. P. 80–90.
- Zbynik T., Psutka J. Speech production based on the mel-frequency cepstral coefficients. Eurospeech 1999. International Speech Communication Association. 1999. P. 2335–2338.
- Davis S., Mermelstein P. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentence. IEEE Transaction on Acoustics, Speech and Signals Processing. 1980. V. 28. № 4. P. 357–366.
- Shepelev I.E., Lazurenko D.M. Nejrosetevoj podkhod k zadache klassifikatsii patternov elektroentsefalogrammy myslennykh dvizhenij. Nauchnaya sessiya NIYaU MIFI–2012. Nejroinformatika–2012: Sb. nauch. trudov XIV Vserossijskoj nauch.-tekhnich. konf. Moskva: Natsional'nyj issledovatel'skij yadernyj universitet «MIFI». 2012. S. 238–245. (in Russian)
- Shepelev I.E., Nadtoka I.I., Vyalkova S.A., Gubskij S.O. Sravnitel'nyj analiz iteratsionnogo i pryamogo nejrosetevogo kratkosrochnogo prognozirovaniya elektropotrebleniya krupnogo goroda. Nejrokomp'yutery: razrabotka, primenenie. 2016. № 3. S. 21–30. (in Russian)
- Shepelev I.E., Lazurenko D.M., Kiroj V.N., Aslanyan E.V., Bakhtin O.M., Minyaeva N.R. Novyj nejrosetevoj podkhod k sozdaniyu IMK na osnove EEG-patternov proizvol'nykh myslennykh dvizhenij. Zhurnal vysshej nervnoj deyatel'nosti. 2017. T. 67. № 4. S. 527–545. (in Russian)
- Khajkin S. Nejronnye seti: polnyj kurs. Izd. 2-e. M.: Vil'yams. 2006. (in Russian)
- Gill F., Myurrej U., Rajt M. Prakticheskaya optimizatsiya: Per. s angl. M.: Mir. 1985. (in Russian)