Publishing house Radiotekhnika

"Publishing house Radiotekhnika":
scientific and technical literature.
Books and journals of publishing houses: IPRZHR, RS-PRESS, SCIENCE-PRESS

Тел.: +7 (495) 625-9241


Real-time voice conversion using artificial neural network with rectified linear units


I. S. Azarov – Ph.D. (Eng.), Belarusian State University of Informatics and Radioelectronics. E-mail:
М. I. Vashkevich – Assistant, Belarusian State University of Informatics and Radioelectronics. E-mail:
А. А. Petrovsky – Dr.Sc. (Eng.), Professor, Belarusian State University of Informatics and Radioelectronics. E-mail:

The paper presents a voice conversion technique that can be used in real-time applications. The technique is based on transformation of short-time spectral envelopes of speech using artificial neural network with Rectified Linear Units. A special network configuration is used that takes into account temporary speaker states. Speech is represented as instantaneous parameters of the harmonic + noise model. The proposed voice conversion technique is compared to main alternative techniques using objective and subjective measures.

  1. Stylianou Y., Cappe O., Moulines E. Continuous probabilistic transform for voice conversion // IEEE Trans. Speech Audio Process. 1998. V. 6. № 2. P. 131–142.
  2. Toda T., Black A.W., Tokuda K. Voice conversion based on maximum likelihood estimation of spectral parameter trajectory // IEEE Trans. Audio, Speech and Language Processing. 2007. V. 15. № 8. P. 2222–2235.
  3. Toda T., Muramatsu T., Banno H. Implementation of computationally efficient real-time voice conversion // Proc. INTER­SPEECH. Portland. USA. Sep. 2012.
  4. Peng D., Zhang X., Sun J. Voice conversion based on GMM and artificial neural network // Proc. ICCT. Nanjing. China. Nov. 2010. P. 1121–1124.
  5. Godoy E., Rosec O., Chonavel T. Spectral envelope transformation using DFW and amplitude scaling for voice conversion with parallel or nonparallel corpora // Proc. INTERSPEECH. Florence. Italy. Aug. 2011.
  6. Erro D., Navas E., Hernaez I. Parametric voice conversion based on bilinear frequency warping plus amplitude scaling // IEEE Trans. Audio, Speech and Language Processing. 2013. V. 21. № 3. P. 556–566.
  7. Narendranath M., Murthy H.A., Rajendran S., Yegnanarayana B. Transformation of formants for voice conversion using artificial neural networks // Speech Communication. 1995. V. 16. P. 207–216.
  8. Desai S., Black A.W., Yegnanarayana B., Prahallad B. Spectral mapping using artificial neural networks for voice conversion // IEEE Trans. Audio, Speech and Language Processing. 2010. V. 18. № 5. P. 954–964.
  9. Zeiler M., Ranzato M., Monga R., Mao M., et al. On rectified linear units for speech processing // Proc. ICASP. Vancouver. Canada. May 2013.
  10. Kawaahra H., Nisimura R., Irino T., Morise M., Takahashi T., Banno B. Temporally variable multi-aspect auditory morphing enabling extrapolation without objective and perceptual breakdown // Proc. ICASSP. Taipei. Taiwan. April 2009.
  11. Azarov E., Vashkevich M., Petrovsky A. Instantaneous pitch estimation based on RAPT framework // Proc. EUSIPCO. Bucharest. Romania. Aug. 2012.
  12. Azarov E., Petrovsky A. Real-time voice conversion based on instantaneous harmonic parameters // Proc. ICASSP. Prague. Czech Republic. May 2011.
  13. Osovskiy S. Neyronnye seti dlya obrabotki informatsii. M.: Finansy i statistika. 2002. 344 s.
  14. Bacon S., Grantham D. Modulation masking: effects of modulation frequency, depth, and phase // Journal of acoustical society of America. 1989. V. 85. P. 2575–2580.
  15. Nair V., Hinton G.E. Rectified linear units improve restricted Boltzmann machines // Proc. ICML. Haifa. Israel. June 2010.
  16. Lee K.Y., Zhao Y. Statistical conversion algorithms of pitch contours based on prosodic phrases // Proceedings of the International Conference “Speech Prosody 2004” (SP 2004). Nara. Japan. March 23–26 2004. CD-ROM.

May 29, 2020

© Издательство «РАДИОТЕХНИКА», 2004-2017            Тел.: (495) 625-9241                   Designed by [SWAP]Studio