Extraction of voice features of speech signal based on discrete wavelet transform

350 rub

Journal Achievements of Modern Radioelectronics №10 for 2024 г.

Article in number:

Type of article: scientific article

DOI: 10.18127/j20700784-202410-02

UDC: 004.093:57.087.1

Keywords: Voice recognition systems speech signal voice features discrete wavelet transform approximation coefficients detail coefficients signal-to-noise ratio

Authors:

A.V. Korennoj1, S.М. Alshavva2, D.S. Yudakov3

1–3 Air Force Academy named after Professor N.E. Zhukovsky and Y.A. Gagarina, Voronezh, Russia)

1 korennoj@mail.ru, 2 yds12345@rambler.ru, 3 ashawasafwan7@gmail.com

Abstract:

When building voice recognition systems, one of the most important stages of the system is the extraction of informative features of the speech signal. In addition to the computational complexity, the performance of most known feature extraction methods degrades at low signal-to-noise ratios, which affects the accuracy of the formation of subscriber voice models and model matching, and therefore the accuracy of the recognition system. The proposed method is based on representing the spectral characteristics of the subscriber’s vocal tract filter (speech apparatus) by approximating coefficients of the discrete wavelet transform of the logarithm of the speech signal spectrum, which will allow the recognition system to operate effectively in conditions of low signal-to-noise ratios (until 3 dB) with low computing requirements.

Pages: 10-16

For citation

Korennoj A.V., Alshavva S.М., Yudakov D.S. Extraction of voice features of speech signal based on discrete wavelet transform. Achievements of modern radioelectronics. 2024. V. 78. № 10. P. 10–16. DOI: https://doi.org/10.18127/j20700784-202410-02 [in Russian]

References

Ravi P.R, Kevin R.F., Roopashri R., Richard J.M. Speaker recognition-general classifier approaches and data fusion methods. Pattern Recognition, Elsevier Science Ltd. 2002. V. 35. P. 2801–2821.
Sahidullah M., Chakroborty S., Saha G. On the use of perceptual Line Spectral pairs Frequencies and higher-order residual moments for Speaker Identification. International. J. Biometrics. 2010. V. 2. № 4. P. 358–378.
Nilu S. et al. MFCC and Prosodic Feature Extraction Techniques: A Comparative Study? International Journal of Computer Applications. Published by Foundation of Computer Science, New York, USA. Sept. 2012. V. 54(1). P. 9–13.
Sud'enkova A.V. Obzor metodov izvlecheniya akusticheskikh priznakov rechi v zadache raspoznavaniya diktora. Sb. nauch. trudov NGTU. 2019. № 3-4. S. 139–164. [in Russian]
Rabiner L.R., Schafer R.W. Digital processing of speech signal. New Jersey, Prentice-Hall, 1978 (Russ. ed.: Rabiner L.R., Shafer R.V. Tsifrovaya obrabotka rechevykh signalov. Moscow, Radio i svyaz' Publ., 1981)
Rabiner L., Juang B.-H. Fundamentals of speech recognition. NJ: Prentice-Hall, Inc., 1993.
Wang F., Xu W. A comparison of algorithms for the calculation of LPC coefficients. Proceedings of International Conference on Information Science, Electronics and Electrical Engineering, Sapporo, Japan. 2014. P. 300–302.
Mallat S. A Theory for Multiresolution Signal Decomposition: the Wavelet Representation. IEEE Pattern Anal. And Machine Intel. 1989. V. 11. № 7. P. 674–693.
Mallat S.G. A Wavelet Tour of Signal Processing. Academic Press. 1997.
Goswami J.C., Chan A.K. Fundamentals of Wavelets Theory, Algorithms and Applications. John Wiley & Sons Ltd. 1999.

Date of receipt: 02.09.2024

Approved after review: 12.09.2024

Accepted for publication: 24.09.2024