Method of informative speech signal extraction in tasks of automatical speaker verification

350 rub

Journal Biomedical Radioelectronics №6 for 2012 г.

Article in number:

Keywords: speech recognition speaker verification speech borders

Authors:

Yu.G. Spazhakin, L.T. Sushkova

Abstract:

One of the main processes modeling of vocal tract parameters of the speaker in task of automatic verification is an extraction of informative speech fragment or determination of speech borders from array of surrounding noise and sound artifacts. In given article is considered block of preliminary analysis of the vocal commands for text-dependent speaker verification system, for speech borders determination. Traditionally for realization of given task methods based on analysis to short term average energy of the signal are used. Advanced systems further to energy level estimation use analysis of zero-crossings frequency and spectral power of the signal. However, all these methods can-t provide high accuracy of speech borders determination because of their not stable work under noisy signal. This causes appearance of mistakes during speaker voice modeling and reduction of verification accuracy. In the given work it is offered to use method of informative speech fragment extraction based on the tone/noise detector, pitch frequency meter and analysis of entropy of the spectrum, short term average energy, zero-crossing frequency of speech signal. Work of the block of analysis of the vocal commands is founded on parallel processing of specified features. In the event of appearance of signal hits in the output of pitch frequency detector during borders determination, falsely identified as voiced speech, the analysis of data segment on energy level, zero crossings frequency, entropy of the spectrum, duration of given unceasing fragment of the signal occurs. The most optimum hypothesis under collective decision making is taken as true, and segment is classified as border of the vocal command, fragment of the intensive background noise or record artifact. Offered method allows to extract informative speech fragment with high accuracy, mistake of determination of speech borders are 1 - 5 %.

Pages: 68-77

References

Карпов А.А. Робастный метод определения границ речи на основе спектральной энтропии // Искусственный интеллект. 2004. № 4. С. 607-613.
Спажакин Ю.Г., Сушкова Л.Т. Метод выделения информативного речевого фрагмента в задачах автоматической верификации диктора // 13-я междунар. конференция «Цифровая обработка сигналов и ее применение -DSPA-2011» М.: РНТОРЭС им. А.С. Попова 2011. С. 249-252.
DanMiller, SeniorAnalyst. Opusresearch // VoiceBiometricsConference. 2007. Washington.
GhulamMuhammad. Extendedaveragemagnitude difference function based pitch detection // The International Arab Journal of Information Technology. 2011. V. 8.№ 2.
Khurram Waheed, Kim Weaver and Fathi M. Salam. A robust algorithm for detecting speech segments using an entropy contrast // Proc. 45th IEEE International Midwest Symposium on Circuits and Systems MWSCAS-2002. Oklahoma (USA). 2002.
Rabiner L., Juang B. Fundamentals of speech Recognition. New Jersey: Prentice-Hall, Englewood Cliffs. USA. 1993.
Shen J.-L., Hung J.-W., Lee L.-S. Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments // Proc. Int. Conf. on Spoken Lang. ProcessingICSLP-98. Sydney (Australia). 1998.
Young-Hwan Song, Doo-Heon Kyun, Jong-Kuk Kim, Myung-Jin Bae. On SNR Estimation by the Likelhood of near Pitch for Speech Detection, World Academy of Science, Engineering and Technology 32. 2007.