Yu.G. Spazhakin, L.T. Sushkova
One of the main processes modeling of vocal tract parameters of the speaker in task of automatic verification is an extraction of informative speech fragment or determination of speech borders from array of surrounding noise and sound artifacts.
In given article is considered block of preliminary analysis of the vocal commands for text-dependent speaker verification system, for speech borders determination.
Traditionally for realization of given task methods based on analysis to short term average energy of the signal are used. Advanced systems further to energy level estimation use analysis of zero-crossings frequency and spectral power of the signal. However, all these methods can’t provide high accuracy of speech borders determination because of their not stable work under noisy signal. This causes appearance of mistakes during speaker voice modeling and reduction of verification accuracy.
In the given work it is offered to use method of informative speech fragment extraction based on the tone/noise detector, pitch frequency meter and analysis of entropy of the spectrum, short term average energy, zero-crossing frequency of speech signal. Work of the block of analysis of the vocal commands is founded on parallel processing of specified features. In the event of appearance of signal hits in the output of pitch frequency detector during borders determination, falsely identified as voiced speech, the analysis of data segment on energy level, zero crossings frequency, entropy of the spectrum, duration of given unceasing fragment of the signal occurs. The most optimum hypothesis under collective decision making is taken as true, and segment is classified as border of the vocal command, fragment of the intensive background noise or record artifact.
Offered method allows to extract informative speech fragment with high accuracy, mistake of determination of speech borders are 1 - 5 %.