350 rub
Journal Radioengineering №9 for 2012 г.
Article in number:
Speech command recognition in noises algorithm according to cross-correlation portraits with the use of Fourier transform
Authors:
E.U. Lebedeva, A.I. Armer, A.P. Eropheev
Abstract:
There are currently a great number of speech command (SC) recognition systems [1, 2, 3]. At present recognition quality of such systems makes up from 95 to 99 % and directly depends on the level of acoustic noise against the background of which the SC is pronounced. The drawback of the most of such systems is the fact that they turn out to be of little use in conditions of strong noises. That is why increasing of noise immunity of the SC recognition is a topical problem. One of the methods of noise-resistant recognition, being developed by our research group, is identification of the SC, transformed into special cross-correlation portraits (CCP) [2]. In this article it is suggested to use preprocessing of the CCP by Fourier transform in order to increase the SC recognition quality against the acoustic noise background. In the course of studying the lines of the commands CCP, it was determined that they are close to periodic signals. If each line of the CCP is decomposed into amount of periodic trigonometric functions (sinusoids) [4], and then the number of the sinusoids is reduced, leaving only those having the maximum coefficients of decomposition, as a result the main frequencies of the signal will be isolated in each line. After removing insignificant coefficients from the spectra of each line of the CCP we make up a new portrait from the received spectra. Such transform of the CCP allowed increasing probability of correct recognition of the SC at the expense of decreasing the influence of variability of different pronunciations of the speech signals. To evaluate efficiency of the described preprocessing, an experiment was carried out on recognition of the SC from the dictionary containing 10 SCs. 1000 SCs were used for recognition with 100 pronouncements of each command. Signal/noise ratio was equal to 4. The use of preprocessing of the CCP by the Fourier transform allowed increasing the rate of correct recognition from 96,5 % up to 97,9 %. The use of the Student-s test with significance level equal to 0,05 confirmed the increase of probability of correct recognition, i.e. expediency of application of the suggested preprocessing of the CCP.
Pages: 41-44
References
  1. Ронжин А. Л., Ли И. В. Автоматическое распознавание русской речи // Вестник российской академии наук. 2007. Т. 77. № 2. С. 133 - 138.
  2. http://www.connect.ru
  3. Армер А. И.Моделирование и распознавание речевых сигналов на фоне интенсивных помех // Диссертация на соискание ученой степени кандидата технических наук. Ульяновск, 2006. 168 с.
  4. http://robonews.info
  5. http://www.pcweek.ua
  6. Крашенинников В. Р., Армер А. И., Кузнецов В. В., Лебедева Е. Ю. Cross-Correlation Portraits of Speech Signals in Modal-Based Speech Recognition / Proceedings of 10th International Conference on Pattern Recognition and Image Analysis: New Information Technologies, PRIA-8-2007. St-Petersburg, POLITECHNICA. 2010. V. I. Р. 105 - 108.