Quasi-optimal algorithm for joint filtering of a speech signal and detecting pauses

350 rub

Journal Antennas №6 for 2023 г.

Article in number:

Type of article: scientific article

DOI: https://doi.org/10.18127/j03209601-202306-07

UDC: 004.093:57.087.1

Keywords: Speech signal segmentation Kalman filter spectral density error variance signal-to-noise ratio detection threshold error probability

Authors:

A. V. Korennoj1, D. S. Yudakov2, S. M. Alshavva3
1–3 Air Force Academy named after Professor N.E. Zhukovsky and Yu.A. Gagarin (Voronezh, Russia)

1 korennoj@mail.ru, 2 yds12345@rambler.ru, 3 ashawasafwan7@gmail.com

Abstract:

When building voice authentication systems, one of the most important steps of the algorithm is segmentation of the input signal by type of speech/pause. Most of the known algorithms operate at sufficiently high signal-to-noise ratios, and the construction of an optimal detection algorithm is complicated by the choice of a copy of the signal, since each person's speech has unique features. The proposed algorithm is based on the formation of an assessment of the speech signal using the Kalman filter and using this assessment as a copy of the signal at the detection stage. The developed algorithm allows speech/pause segmentation at sufficiently low signal-to-noise ratios, which is confirmed by experimental studies.

Pages: 61-67

For citation

Korennoj A.V., Yudakov D.S., Alshavva S.M. Quasi-optimal algorithm for joint filtering of a speech signal and detecting pauses. Antennas. 2023. № 6. P. 61–67. DOI: https://doi.org/10.18127/j03209601-202306-07 (in Russian)

References

Beigi H. Fundamentals of speaker recognition. Springer Science + Business Media, LLC. 2011.
Campbell J.P. Speaker recognition: A tutorial. Proc. IEEE. 1997. V. 85. № 9. P. 1437–1462.
Atal B., Rabiner L.R. A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition. IEEE Transactions on Acoustics, Speech and Signal Processing. 1976. V. 24 (3). P. 201–212.
Childers D.G., Hand M., Larar J.M. Silent and voiced/unvoiced mixed excitation (four-way), classification of speech. IEEE Transaction on Acoustics, Speech and Signal Processing. 1989. V. 37 (11). P. 1771–1774.
Alimuradov A.K., Tychkov A.Ju. Algoritm segmentacii rech'/pauza na osnove dekompozicii na empiricheskie mody i odnomernogo rasstoyaniya Makhalanobisa. Trudy MFTI. 2021. T. 13. № 3. S. 5–22. (in Russian)
Trifonov A.P., Shinakov Yu.S. Sovmestnoe razlichenie signalov i ocenka ih parametrov na fone pomeh. M.: Radio i svjaz'. 1986. (in Russian)
Korennoj A.V., Kuleshov S.A. Osnovy statisticheskoj teorii radiotehnicheskih sistem: Ucheb. posobie. Pod red. A.V. Korennogo. M.: Radiotekhnika. 2021. (in Russian)
Shejkin R.L. K analizu mekhanizmov vozniknoveniya pauz v rechi. Mehanizmy recheobrazovaniya i vosprijatiya slozhnyh zvukov. 1966. S. 31–44. (in Russian)
Gonzalez S., Brookes M. PEFAC – a pitch estimation algorithm robust to high levels of noise. IEEE Transaction on Audio, Speech, Language Processing. 2014. V. 22. № 2. P. 518–530.
Harel M., Dov D., Cohen I., Meir R., Talmon R. Voiced-unvoiced-silence classification via hierarchical dual geometry analysis. ISCEE International Conference on the Science of Electrical Engineering. Technion City, Haifa. 2016.
Beigi H. Fundamentals of speaker recognition. Springer Science + Business Media, LLC. 2011.
Campbell J.P. Speaker recognition: A tutorial. Proc. IEEE. 1997. V. 85. № 9. P. 1437–1462.
Atal B., Rabiner L.R. A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition. IEEE Transactions on Acoustics, Speech and Signal Processing. 1976. V. 24 (3). P. 201–212.
Childers D.G., Hand M., Larar J.M. Silent and voiced/unvoiced mixed excitation (four-way), classification of speech. IEEE Transaction on Acoustics, Speech and Signal Processing. 1989. V. 37 (11). P. 1771–1774.
Alimuradov A.K., Tychkov A.Ju. Algoritm segmentacii rech'/pauza na osnove dekompozicii na empiricheskie mody i odnomernogo rasstoyaniya Makhalanobisa. Trudy MFTI. 2021. T. 13. № 3. S. 5–22. (in Russian)
Trifonov A.P., Shinakov Yu.S. Sovmestnoe razlichenie signalov i ocenka ih parametrov na fone pomeh. M.: Radio i svjaz'. 1986. (in Russian)
Korennoj A.V., Kuleshov S.A. Osnovy statisticheskoj teorii radiotehnicheskih sistem: Ucheb. posobie. Pod red. A.V. Korennogo. M.: Radiotekhnika. 2021. (in Russian)
Shejkin R.L. K analizu mekhanizmov vozniknoveniya pauz v rechi. Mehanizmy recheobrazovaniya i vosprijatiya slozhnyh zvukov. 1966. S. 31–44. (in Russian)
Gonzalez S., Brookes M. PEFAC – a pitch estimation algorithm robust to high levels of noise. IEEE Transaction on Audio, Speech, Language Processing. 2014. V. 22. № 2. P. 518–530.
Harel M., Dov D., Cohen I., Meir R., Talmon R. Voiced-unvoiced-silence classification via hierarchical dual geometry analysis. ISCEE International Conference on the Science of Electrical Engineering. Technion City, Haifa. 2016.

Date of receipt: 10.10.2023

Approved after review: 02.11.2023

Accepted for publication: 21.11.2023