Voice activity detector based on voting Gaussian mixture models

350 rub

Journal Electromagnetic Waves and Electronic Systems №8 for 2015 г.

Article in number:

Keywords: speech signal voice activity detector GMM EER noise

Authors:

S.A. Kravtsov - Post-graduate Student, P.G. Demidov Yaroslavl State University. E-mail: sk860@outlook.com A.I. Topnikov - Ph. D. (Eng.), Assistant, P.G. Demidov Yaroslavl State University. E-mail: topartgroup@gmail.com A.L. Priorov - Dr. Sc. (Eng.), Associate Professor, P.G. Demidov Yaroslavl State University. E-mail: andcat@yandex.ru

Abstract:

Robust voice activity detection is very important for automatic speech processing systems. Modern voice activity detectors (VAD) com-monly have a serious problem with right segmentations signal when they are used for noisy speech mixed with non-speech fragments. There are two main types of VAD with different decision rules: the first based on compare some value with threshold and the second is used classifier such as machine learning. Current way of using Gaussian mixture models (GMM) is training two models: speech-model and non-speech model. In our approach, the voice activity detector based on the application mel-frequency cepstral coefficients (MFCC), spectral flatness measure (SFM) as a parameters and classification by GMM. The feature of the proposed algorithm is to construct several models for the individual bands of the signal-to-noise ratio (SNR) during the training models step and using the voting process by these models. Speech-model and non-speech model which have been calculated for each SNR band form the group. The number of groups equal to the number of SNR bands. Each signal fragment is transfer on all groups. Using a comparison log-likelihood, the group determines presence of speech or non-speech. If speech model log-likelihood more then non-speech, the group\'s voice is equal to 1, else - to 0. Finally, decision is based on the compare number of votes with threshold. Value of the voice is equivalent for all groups, on the other hand the accuracy of the groups is different. For greater accuracy therefore proposed to add the weight of the vote. Algorithm was compare with the original classification by GMM on the SNR band from −15 to 25 dB. The detection accuracy of our classification approach is more then classification by original GMM with two models. Given the results of the research work of the algorithm and its comparison with the original classification by GMM for cases of noisy by additive white Gaussian noise (AWGN) and some types on noise from Noisex-92 library. In comparing algorithms, the equal error rate (EER) for our weighted voting algorithm less then GMM by about 0,7 mean value of percentage point in case where noise is AWGN and by about 1,95 mean value of percentage point in case where noise is sound of a running engine Volvo car on the SNR band from −15 to 20 dB.

Pages: 29-34

References

Voznesenskaja T.V., Kotov M.A., Lednov D.A. Gibridnyjj detektor rechi // Cifrovaja obrabotka signalov. 2014. № 4. S. 53−56.
Brady P.T. A technique for investigating On‑Off patterns of speech // Bell System Technical Journal. 1965. V. 44. № 1. P. 1−22.
Rabiner L.R., Sambur M.R. An algorithm for determining the endpoints of isolated utterances // Bell Systems Technical Journal. 1975. V. 54. № 2. P. 297−315.
Kondoz A.M. Digital speech. Coding for low-bit rate communication systems. JohnWilley & Sons. 2004. 442 p.
Volchenkov V.A., Vitjazev V.V. Metody i algoritmy detektirovanija aktivnosti rechi // Cifrovaja obrabotka signalov. 2013. № 1. S. 54−60.
Rosen O., Mousazadeh S., Cohen I. Voice activity detection in presence of transient noise using spectral clustering and diffusion kernels // IEEE 28th Convention of Electrical & Electronics Engineers in Israel. 2014. P. 1−5.
Mamiya Y., Yamagishi J., Watts O., Clark R., King S., Stan A. Lightly supervised GMM VAD to use audiobook for speech synthesiser // IEEE International Conference on Acoustics, Speech and Signal Processing. 2013. P. 7987−7991.
Topnikov A.I., Veselov I.A., Novoselov S.A., Priorov A.L. Vydelenie rechevykh komand na osnove pomekhoustojjchivykh parametrov i modelejj gaussovykh smesejj // Proektirovanie i tekhnologija ehlektronnykh sredstv. 2011. № 4. S. 31−35.
Enqing D., Guizhong L., Yatong Z., Xiaodi Z. Applying support vector machines to voice activity detection // 6th International Conference on Signal Processing. 2002. V. 2. P. 1124−1127.
Kinnunen T., Chernenko E., Tuononen M., Fränti P., Li H. Voice activity detection using MFCC features and support vector machine // International Conference on Speech and Computer. 2007. V. 2. P. 556−561.
Wu J., Zhang X.L. Efficient multiple kernel support vector machine based voice activity detection // Signal Processing Letters. 2011. V. 18. № 8. P. 466−469.
Zhang X.L., Wu J. Deep belief networks based voice activity detection // IEEE Transactions on Audio, Speech, and Language Processing. 2013. V. 21. № 4. P. 697−710.
Chen S.H., Guido R.C., Truong T.K., Chang Y. Improved voice activity detection algorithm using wavelet and support vector machine // Computer Speech & Language. 2010. V. 24. № 3. P. 531−543.
Ikedo J. Voice activity detection using neural network // IEICE transactions on communications. 1998. V. 81. № 12. P. 2509−2513.
Pham T.V., Tang C.T., Stadtschnitzer M. Using artificial neural network for robust voice activity detection under adverse conditions // International Conference on Computing and Communication Technologies. 2009. P. 1−8.
Hughes T., Mierle K. Recurrent neural networks for voice activity detection // IEEE International Conference on Acoustics, Speech and Signal Processing. 2013. P. 7378−7382.
Sreekumar K.T., George K.K., Arunraj K., Kumar C.S. Spectral matching based voice activity detector for improved speaker recognition // International Conference on Power Signals Control and Computations. 2014. P. 1−4.
Sohn J., Kim N.S., Sung W. A statistical model-based voice activity detection // Signal Processing Letters. 1999. V. 6. № 1. P. 1−3.
Petukhova N.V., Vaskovskijj S.V., Farkhadov M.P. Kompjuternye rechevye tekhnologii v sovremennykh informacionnykh i servisnykh sistemakh // Informacionno-izmeritelnye i upravljajushhie sistemy. 2013. № 3. S. 61−67.
Moattar M.H., Homayounpour M.M. A simple but efficient real-time voice activity detection algorithm // 17th European Signal Processing Conference. 2009. P. 2549−2553.
Agranovskijj A.V., Lednov D.A. Teoreticheskie aspekty algoritmov obrabotki i klassifikacii rechevykh signalov. M.: Radio i svjaz. 2004. 164 s.