skip to main content
research-article

Generation of Voice Signal Tone Sandhi and Melody Based on Convolutional Neural Network

Published:08 May 2023Publication History
Skip Abstract Section

Abstract

There is a need to prevent the use of modulated voice signals to conduct criminal activities. Voice signal change detection based on convolutional neural networks is proposed. We use three commonly used voice processing software (Audacity, CoolEdit, and RTISI) to change tones in voice libraries. The research further raises each voice by five semitones and are recorded at different levels (+4, +5, +6, +7, and +8, respectively). Simultaneously, every voice is lowered by five halftones, represented as –4, –5, –6, –7, and –8, respectively. The convolution neural network corresponding to network b-3 is determined as the final classifier in this article through experiments. The average accuracy A1 of its three categories has reached more than 97%, the detection accuracy A2 of electronic tone sandhi speech has reached more than 97%, and the false alarm rate of the original speech is less than 1.9%. The outcomes obtained shows that the detection algorithm in this article is effective, and it has good generalization ability.

REFERENCES

  1. [1] Zhang T., Zhang Y., Cao Y., Li N., and Hao L.. 2020. Diagnosing parkinson's disease with speech signal based on convolutional neural network. Int. J. Comput. Appl. Technol. 63, 4 (2020), 348.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. [2] Ping T., Nan X. R., Yuen I., Gao L., and Demuth K.. 2019. The development of abstract representations of tone sandhi. Dev. Psychol. 55, 10 (2019), 21142122.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Andersen A. H., Haan J. M. D., Tan Z. H., and Jensen J.. 2018. Non-intrusive speech intelligibility prediction using convolutional neural networks. IEEE/ACM Trans. Aud. Speech Lang. Process. 99 (2018), 11.Google ScholarGoogle Scholar
  4. [4] Chen S., Yang Y., Liu X., and Zhu S.. 2022. Dual discriminator GAN: Restoring ancient Yi characters. Trans. As. Low-Resour. Lang. Inf. Process. 21, 4 (2022), 123.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Mao Z., Chu C., and Kurohashi S.. 2022. Linguistically driven Multi-Task Pre-Training for Low-Resource neural machine translation. Trans. As. Low-Resour. Lang. Inf. Process. 21, 4 (2022), 129.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Abderrahim M. A. and Abderrahim M. E. A.. 2022. Arabic word sense disambiguation for information retrieval. Trans. As. Low-Resour. Lang. Inf. Process. 21, 4 (2022), 119.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. [7] Cui Y., Che W., Yang Z., Liu T., Qin B., Wang S., and Hu G.. 2022. Interactive gated decoder for machine reading comprehension. Trans. As. Low-Resour. Lang. Inf. Process. 21, 4 (2022), 119.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Husain F. and Uzuner O.. 2022. Investigating the effect of preprocessing arabic text on offensive language and hate speech detection. Trans. As. Low-Resour. Lang. Inf. Process. 21, 4 (2022), 120.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Kumar G., Kumar S., and Kumar N.. 2015. Comparative study of wavelet and wavelet packet transform for denoising telephonic speech signal. Int. J. Comput. Appl. 110, 15 (2015), 18.Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Yan H. and Zhang J.. 2016. Pattern substitution in wuxi tone sandhi and its implication for phonological learning. Int. J. Chin. Ling. 3, 1 (2016), 144.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Kandagatla R. K. and Potluri V. S.. 2020. Performance analysis of neural network, nmf and statistical approaches for speech enhancement. Int. J. Speech Technol. 23, 4 (2020), 121.Google ScholarGoogle Scholar
  12. [12] Tominaga N., Sugiura Y., and Shimamura T.. 2020. Speech enhancement based on deep neural networks considering features of speech distribution. J. Sign. Process. 24, 4 (2020), 179182.Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Belorutsky R. Y. and Zhitnik S. V.. 2019. Speech recognition based on convolution neural networks. Iss. Radio Electr. 4 (2019), 4752.Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Hajarolasvadi N. and Demirel H.. 2019. 3d cnn-based speech emotion recognition using k-means clustering and spectrograms. Entropy 21, 5 (2019), 479.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Wen Z., Li K., Huang Z., Lee C. H., and Tao J.. 2018. Improving deep neural network based speech synthesis through contextual feature parametrization and multi-task learning. J. Sign. Process. Syst. 90, 7 (2018), 10251037.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. [16] Dash T. K. and Solanki S. S.. 2020. Speech intelligibility based enhancement system using modified deep neural network and adaptive multi-band spectral subtraction. Wireless Pers. Commun. 111, 2 (2020), 10731087.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Wu Y., Mao H., and Yi Z.. 2018. Audio classification using attention-augmented convolutional neural network. Knowl.-Bas. Syst. 161, 1 (December 2018), 90100.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Mekyska J., Janousova E., Gomez-Vilda P., Smekal Z., Rektorova I., Eliasova I., and Lopez-de-Ipina K.. 2015. Robust and complex approach of pathological speech signal analysis. Neurocomputing 167 (2015), 94111.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] Okamoto T., Tachibana K., Toda T., Shiga Y., and Kawai H.. 2018. Deep neural network-based power spectrum reconstruction to improve quality of vocoded speech with limited acoustic parameters. Acoust. Sci. Technol. 39, 2 (2018), 163166.Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Tamm M. O., Muhammad Y., and Muhammad N.. 2020. Classification of vowels from imagined speech with convolutional neural networks. Computers 9, 2 (2020), 46.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Alshaibani H. and Swady H. M.. 2019. Mobile-based human emotion recognition is based on speech and heart rate. Univ. Baghd. Eng. J. 25, 11 (2019), 5566.Google ScholarGoogle Scholar
  22. [22] Liu J. H., Ling Z. H., Wei S., Hu G. P., and Dai L. R.. 2017. Improving the decoding efficiency of deep neural network acoustic models by cluster-based senone selection. J. Sign. Process. Syst. 90, 2 (2017), 113.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Shamila S., Snekhalatha U., and Balakrishnan D.. 2017. Spectral analysis and feature extraction of speech signal in dysphonia patients. Int. J. Pure Appl. Math. 113, 11 (2017), 151160.Google ScholarGoogle Scholar
  24. [24] Kamble M. R., Tak H., and Patil H. A.. 2020. Amplitude and frequency modulation-based features for detection of replay spoof speech. Speech Commun. 125, 4 (2020), 114127.Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Mahmoodzadeh Azar and HamidRezaAbutalebi. 2017. Hybrid approach to single-channel speech separation based on coherent–incoherent modulation filtering. Circ. Syst. Sign. Process. 36, 5 (2017), 19701988.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. [26] Fawzy M., Shalaby M., Kamal Y., and Elramly S.. 2017. A speech cryptosystem based on chaotic modulation technique. Egypt. J. Lang. Eng. 4, 1 (2017), 110.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Ghaidan K. A., Issa H., and Trad E.. 2015. Artificial intelligence for speech recognition based on neural networks. J. Sign. Inf. Process. 06, 2 (2015), 6672.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Bhardwaj C., Jain S., and Sood M.. 2021. Diabetic retinopathy severity grading employing quadrant-based Inception-V3 convolution neural network architecture. Int. J. Imag. Syst. Technol. 31, 2 (2021), 592608.Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Bhardwaj C., Jain S., and Sood M.. 2021. Deep learning–based diabetic retinopathy severity grading system employing quadrant ensemble model. J. Digit. Imag. 34, 2 (2021), 440457.Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Qu Z., Wang W., Hou C., and Hou C.. 2019. Radar signal intra-pulse modulation recognition based on convolutional denoising autoencoder and deep convolutional neural network. IEEE Access 99 (2019), 11.Google ScholarGoogle Scholar
  31. [31] Sun J., Xu G., Ren W., and Yan Z.. 2018. Radar emitter classification based on unidimensional convolutional neural network. Radar Sonar Navig. IET 12, 8 (2018), 862867.Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Wang D., Zhang M., Li J., Li Z., Li J., and Song C.. 2017. Intelligent constellation diagram analyzer using convolutional neural network-based deep learning. Opt. Express 25, 15 (2017), 17150.Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] JRgensen S., Decorsière R., and Dau T.. 2015. Effects of manipulating the signal-to-noise envelope power ratio on speech intelligibility. J. Acoust. Soc. Am. 137, 3 (2015), 1401.Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] Kovacs G., Toth L., and Compernolle D. V.. 2015. Selection and enhancement of gabor filters for automatic speech recognition. Int. J. Speech Technol. 18, 1 (2015), 116.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. [35] Zeng R. H. and Zhang S. Q.. 2018. Speech and emotional recognition method based on improving convolutional neural networks. J. Appl. Sci. 36, 5 (2018), 837844.Google ScholarGoogle Scholar
  36. [36] Peng S., Jiang H., Wang H., Alwageed H., and Yao Y. D.. 2018. Modulation classification based on signal constellation diagrams and deep learning. IEEE Trans. Neural Netw. Learn. Syst. 99 (2018), 110.Google ScholarGoogle Scholar
  37. [37] Wang J., Wang W., Luo F., and Wei S.. 2019. Modulation classification based on denoising autoencoder and convolutional neural network with gnu radio. J. Eng. 19 (2019), 61886191.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Generation of Voice Signal Tone Sandhi and Melody Based on Convolutional Neural Network

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 22, Issue 5
      May 2023
      653 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3596451
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 8 May 2023
      • Online AM: 19 September 2022
      • Accepted: 6 June 2022
      • Revised: 6 May 2022
      • Received: 1 March 2022
      Published in tallip Volume 22, Issue 5

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
    • Article Metrics

      • Downloads (Last 12 months)96
      • Downloads (Last 6 weeks)5

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!