Abstract
There is a need to prevent the use of modulated voice signals to conduct criminal activities. Voice signal change detection based on convolutional neural networks is proposed. We use three commonly used voice processing software (Audacity, CoolEdit, and RTISI) to change tones in voice libraries. The research further raises each voice by five semitones and are recorded at different levels (+4, +5, +6, +7, and +8, respectively). Simultaneously, every voice is lowered by five halftones, represented as –4, –5, –6, –7, and –8, respectively. The convolution neural network corresponding to network b-3 is determined as the final classifier in this article through experiments. The average accuracy A1 of its three categories has reached more than 97%, the detection accuracy A2 of electronic tone sandhi speech has reached more than 97%, and the false alarm rate of the original speech is less than 1.9%. The outcomes obtained shows that the detection algorithm in this article is effective, and it has good generalization ability.
- [1] . 2020. Diagnosing parkinson's disease with speech signal based on convolutional neural network. Int. J. Comput. Appl. Technol. 63, 4 (2020), 348.Google Scholar
Digital Library
- [2] . 2019. The development of abstract representations of tone sandhi. Dev. Psychol. 55, 10 (2019), 2114–2122.Google Scholar
Cross Ref
- [3] . 2018. Non-intrusive speech intelligibility prediction using convolutional neural networks. IEEE/ACM Trans. Aud. Speech Lang. Process. 99 (2018), 1–1.Google Scholar
- [4] . 2022. Dual discriminator GAN: Restoring ancient Yi characters. Trans. As. Low-Resour. Lang. Inf. Process. 21, 4 (2022), 1–23.Google Scholar
Digital Library
- [5] . 2022. Linguistically driven Multi-Task Pre-Training for Low-Resource neural machine translation. Trans. As. Low-Resour. Lang. Inf. Process. 21, 4 (2022), 1–29.Google Scholar
Digital Library
- [6] . 2022. Arabic word sense disambiguation for information retrieval. Trans. As. Low-Resour. Lang. Inf. Process. 21, 4 (2022), 1–19.Google Scholar
Digital Library
- [7] . 2022. Interactive gated decoder for machine reading comprehension. Trans. As. Low-Resour. Lang. Inf. Process. 21, 4 (2022), 1–19.Google Scholar
Digital Library
- [8] . 2022. Investigating the effect of preprocessing arabic text on offensive language and hate speech detection. Trans. As. Low-Resour. Lang. Inf. Process. 21, 4 (2022), 1–20.Google Scholar
Digital Library
- [9] . 2015. Comparative study of wavelet and wavelet packet transform for denoising telephonic speech signal. Int. J. Comput. Appl. 110, 15 (2015), 1–8.Google Scholar
Cross Ref
- [10] . 2016. Pattern substitution in wuxi tone sandhi and its implication for phonological learning. Int. J. Chin. Ling. 3, 1 (2016), 1–44.Google Scholar
Cross Ref
- [11] . 2020. Performance analysis of neural network, nmf and statistical approaches for speech enhancement. Int. J. Speech Technol. 23, 4 (2020), 1–21.Google Scholar
- [12] . 2020. Speech enhancement based on deep neural networks considering features of speech distribution. J. Sign. Process. 24, 4 (2020), 179–182.Google Scholar
Cross Ref
- [13] . 2019. Speech recognition based on convolution neural networks. Iss. Radio Electr. 4 (2019), 47–52.Google Scholar
Cross Ref
- [14] . 2019. 3d cnn-based speech emotion recognition using k-means clustering and spectrograms. Entropy 21, 5 (2019), 479.Google Scholar
Cross Ref
- [15] . 2018. Improving deep neural network based speech synthesis through contextual feature parametrization and multi-task learning. J. Sign. Process. Syst. 90, 7 (2018), 1025–1037.Google Scholar
Digital Library
- [16] . 2020. Speech intelligibility based enhancement system using modified deep neural network and adaptive multi-band spectral subtraction. Wireless Pers. Commun. 111, 2 (2020), 1073–1087.Google Scholar
Cross Ref
- [17] . 2018. Audio classification using attention-augmented convolutional neural network. Knowl.-Bas. Syst. 161, 1 (December 2018), 90–100.Google Scholar
Cross Ref
- [18] . 2015. Robust and complex approach of pathological speech signal analysis. Neurocomputing 167 (2015), 94–111.Google Scholar
Digital Library
- [19] . 2018. Deep neural network-based power spectrum reconstruction to improve quality of vocoded speech with limited acoustic parameters. Acoust. Sci. Technol. 39, 2 (2018), 163–166.Google Scholar
Cross Ref
- [20] . 2020. Classification of vowels from imagined speech with convolutional neural networks. Computers 9, 2 (2020), 46.Google Scholar
Cross Ref
- [21] . 2019. Mobile-based human emotion recognition is based on speech and heart rate. Univ. Baghd. Eng. J. 25, 11 (2019), 55–66.Google Scholar
- [22] . 2017. Improving the decoding efficiency of deep neural network acoustic models by cluster-based senone selection. J. Sign. Process. Syst. 90, 2 (2017), 1–13.Google Scholar
Digital Library
- [23] . 2017. Spectral analysis and feature extraction of speech signal in dysphonia patients. Int. J. Pure Appl. Math. 113, 11 (2017), 151–160.Google Scholar
- [24] . 2020. Amplitude and frequency modulation-based features for detection of replay spoof speech. Speech Commun. 125, 4 (2020), 114–127.Google Scholar
Cross Ref
- [25] and HamidRezaAbutalebi. 2017. Hybrid approach to single-channel speech separation based on coherent–incoherent modulation filtering. Circ. Syst. Sign. Process. 36, 5 (2017), 1970–1988.Google Scholar
Digital Library
- [26] . 2017. A speech cryptosystem based on chaotic modulation technique. Egypt. J. Lang. Eng. 4, 1 (2017), 1–10.Google Scholar
Cross Ref
- [27] . 2015. Artificial intelligence for speech recognition based on neural networks. J. Sign. Inf. Process. 06, 2 (2015), 66–72.Google Scholar
Cross Ref
- [28] . 2021. Diabetic retinopathy severity grading employing quadrant-based Inception-V3 convolution neural network architecture. Int. J. Imag. Syst. Technol. 31, 2 (2021), 592–608.Google Scholar
Cross Ref
- [29] . 2021. Deep learning–based diabetic retinopathy severity grading system employing quadrant ensemble model. J. Digit. Imag. 34, 2 (2021), 440–457.Google Scholar
Cross Ref
- [30] . 2019. Radar signal intra-pulse modulation recognition based on convolutional denoising autoencoder and deep convolutional neural network. IEEE Access 99 (2019), 1–1.Google Scholar
- [31] . 2018. Radar emitter classification based on unidimensional convolutional neural network. Radar Sonar Navig. IET 12, 8 (2018), 862–867.Google Scholar
Cross Ref
- [32] . 2017. Intelligent constellation diagram analyzer using convolutional neural network-based deep learning. Opt. Express 25, 15 (2017), 17150.Google Scholar
Cross Ref
- [33] . 2015. Effects of manipulating the signal-to-noise envelope power ratio on speech intelligibility. J. Acoust. Soc. Am. 137, 3 (2015), 1401.Google Scholar
Cross Ref
- [34] . 2015. Selection and enhancement of gabor filters for automatic speech recognition. Int. J. Speech Technol. 18, 1 (2015), 1–16.Google Scholar
Digital Library
- [35] . 2018. Speech and emotional recognition method based on improving convolutional neural networks. J. Appl. Sci. 36, 5 (2018), 837–844.Google Scholar
- [36] . 2018. Modulation classification based on signal constellation diagrams and deep learning. IEEE Trans. Neural Netw. Learn. Syst. 99 (2018), 1–10.Google Scholar
- [37] . 2019. Modulation classification based on denoising autoencoder and convolutional neural network with gnu radio. J. Eng. 19 (2019), 6188–6191.Google Scholar
Cross Ref
Index Terms
Generation of Voice Signal Tone Sandhi and Melody Based on Convolutional Neural Network
Recommendations
Tone Recognition of Continuous Mandarin Speech Based on Tone Nucleus Model and Neural Network
A method was developed for automatic recognition of syllable tone types in continuous speech of Mandarin by integrating two techniques, tone nucleus modeling and neural network classifier. The tone nucleus modeling considers a syllable F0 contour as ...
A tone-modeling technique using a quantized F0 context to improve tone correctness in average-voice-based speech synthesis
This paper proposes a technique of improving tone correctness in speech synthesis of a tonal language based on an average-voice model trained with a corpus from nonprofessional speakers' speech. We focused on reducing tone disagreements in speech data ...
Mandarin voice conversion using tone codebook mapping
ICMLC'05: Proceedings of the 4th international conference on Advances in Machine Learning and CyberneticsA tone codebook mapping method is proposed to obtain a better performance in voice conversion of Mandarin speech than the conventional conversion method which deals mainly with short-time spectral envelopes. The pitch contour of the whole Mandarin ...






Comments