Abstract
This article studies noised Asian speech enhancement based on the deep neural network (DNN) and its implementation on an app. We use the THCHS-30 speech dataset and the common noise dataset in daily life as training and testing data of the DNN. To stack the frequency data of multiple audio frames to improve the effect of speech enhancement, the system compares the best number of stacked frames during training and testing. At the same time, the influence of training rounds on the PESQ is compared, and the best number of rounds is obtained. On this basis, the best model is implemented on the hearing aid app, and the real-time performance of the device is tested. The experiment shows that based on the DNN, using an appropriate number of rounds for training and using an appropriate number of audio frames stacking to improve the speech enhancement effect, and transplanting this speech enhancement model to the hearing aid app, can effectively improve speech clarity and intelligibility within a reasonable time delay range.
- Xiaoling Ma, Xun Liu, Sixing Zhang, Mengkang Zhang, Xiushan Cao, Liuming Tian, and Wen Gao. 2014. The development and prospect of hearing aids. Journal of Minzu University of China (Natural Sciences Edition).Google Scholar
- A. W. Rix, J. G. Beerends, M. P. Hollier, and A. P. Hekstra. 2001. Perceptual evaluation of speech quality (PESQ)—A new method for speech quality assessment of telephone networks and codecs. In Proceedings of the 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. IEEE, Los Alamitos, CA, 749–752. Google Scholar
Digital Library
- S. Boll. 1979. Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing 27, 2 (1979), 113–120.Google Scholar
Cross Ref
- J. S. Lim and A. V. Oppenheim. 1978. All-pole modeling of degraded speech. IEEE Transactions on Acoustics Speech, and Signal Processing 26, 3 (1978), 197–210.Google Scholar
Cross Ref
- J. Chen, J. Benesty, Y. Huang, and S. Doclo. 2006. New insights into the noise reduction Wiener filter. IEEE Transactions on Acoustics, Speech, and Signal Processing 14, 4 (2006), 1218–1234. Google Scholar
Digital Library
- Y. Ephraim and D. Malah. 1984. Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing 32, 6 (1984), 1109–1121.Google Scholar
Cross Ref
- I. Cohen and B. Berdugo. 2001. Speech enhancement for nonstationary noise environments. Signal Processing 81, 11 (2001), 2403–2418.Google Scholar
Cross Ref
- I. Cohen. 2003. Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging. IEEE Transactions on Speech and Audio Processing 11, 5 (2003), 466–475.Google Scholar
Cross Ref
- S. I. Tamura. 1989. An analysis of a noise reduction neural network. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing. 2001–2004.Google Scholar
Cross Ref
- F. Xie and D. V. Compernolle. 1994. A family of MLP based nonlinear spectral estimators for noise reduction. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing. 53–56.Google Scholar
- G. E. Hinton, S. Osindero, and Y. W. Teh. 2006. A fast learning algorithm for deep belief nets. Neural Computation 18, 7 (2006), 1527–1554. Google Scholar
Digital Library
- G. E. Hinton and R. R. Salakhutdinov. 2006. Reducing the dimensionality of data with neural networks. Science 313, 5786 (2006), 504–507.Google Scholar
- B.-Y. Xia and C.-C. Bao. 2013. Speech enhancement with weighted denoising auto-encoder. In Proceedings of the 2013 INTERSPEECH Conference. 3444–3448.Google Scholar
- X.-G. Lu, Y. Tsao, S. Matsuda, and C. Hori. 2013. Speech enhancement based on deep denoising autoencoder. In Proceedings of the 2013 INTERSPEECH Conference. 436–440.Google Scholar
- Y. Xu, J. Du, L. R. Dai, and C. H. Lee. 2015. A regression approach to speech enhancement based on deep neural networks. IEEE/ACM Transactions on Audio, Speech, and Language Processing 23, 1 (2015), 7–19. Google Scholar
Digital Library
- H. Larochelle, Y. Bengio, J. Louradour, and P. Lamblin. 2009. Exploring strategies for training deep neural networks. Journal of Machine Learning Research 10 (2009), 1–40. Google Scholar
Digital Library
- Y. Bengio. 2009. Learning deep architectures for AI. Foundations and Trends in Machine Learning 2, 1 (2009), 1–127. Google Scholar
Digital Library
- S. Tamura. 1989. An analysis of a noise reduction neural network. In Proceedings of the 1989 International Conference on Acoustics, Speech, and Signal Processing (ICASSP’89).Google Scholar
Cross Ref
- G. E. Hinton, S. Osindero, and Y. W. Teh. 2006. A fast learning algorithm for deep belief nets. Neural Computation 18, 7 (2006), 1527–1554. Google Scholar
Digital Library
- Y. X. Wang and D. L. Wang. 2013. Towards scaling up classification-based speech separation. IEEE Transactions on Audio, Speech, and Language Processing 21, 7 (2013), 1381–1390. Google Scholar
Digital Library
- Y. X. Wang, A. Narayanan, and D. L. Wang. 2014. On training targets for supervised speech separation. IEEE Transactions on Audio, Speech, and Language Processing 22, 12 (2014), 1849–1858. Google Scholar
Digital Library
- F. Weninger, H. Erdogan, S. Watanabe, E. Vincent, J. Le Roux, J. R. Hershey, and B. Schuller. 2015. Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR. In Proceedings of the International Conference on Latent Variable Analysis and Signal Separation. 91–99. Google Scholar
Digital Library
- Serim Park and Jin Lee. 2017. A fully convolutional neural network for speech enhancement. arXiv:1609.07132.Google Scholar
- Y. Xu, J. Du, L.-R. Dai, and C.-H. Lee. 2014. An experimental study on speech enhancement based on deep neural networks. IEEE Signal Processing Letters 21, 1 (2014), 65–68.Google Scholar
Cross Ref
- D. S. Williamson, Y. X. Wang, and D. L. Wang. 2016. Complex ratio masking for monaural speech separation. IEEE/ACM Transactions on Audio, Speech, and Language Processing 24, 3 (2016), 483–492. Google Scholar
Digital Library
- Jingxian Tu and Youshen Xia. 2018. Effective Kalman filtering algorithm for distributed multichannel speech enhancement. Neurocomputing 275, (2018), 144–154.Google Scholar
- R. K. Kandagatla and P. V. Subbaiah. 2018. Speech enhancement using MMSE estimation of amplitude and complex speech spectral coefficients under phase-uncertainty. Speech Communication 96 (2018), 10–27. Google Scholar
Digital Library
- Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish Narayanan. 2015. Deep learning with limited numerical precision. Proceedings of Machine Learning Research 37 (2015), 1737–1746. Google Scholar
Digital Library
- S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. A. Horowitz, and W. J. Dally. 2016. EIE: Efficient inference engine on compressed deep neural network. ACM SIGARCH Computer Architecture News 44, 3 (2016), 243–254. Google Scholar
Digital Library
- N. D. Lane, S. Bhattacharya, Petko Georgiev, C. Forliveski, L. Jiao, L. Qendro, and F. Kawsar. 2016. DeepX: A software accelerator for low-power deep learning inference on mobile devices. In Proceedings of the 2016 15th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN’16). Google Scholar
Digital Library
- Z. Yermeche, B. Sallberg, N. Grbic, and I. Claesson. 2007. Real-time DSP implementation of a subband beamforming algorithm for dual microphone speech enhancement. In Proceedings of the 2007 IEEE International Symposium on Circuits and Systems. IEEE, Los Alamitos, CA.Google Scholar
- J. M. Valin. 2017. A hybrid DSP/deep learning approach to real-time full-band speech enhancement. arXiv:1709.08243.Google Scholar
- Xiaoqian Fan, Tianyi Sun, Wenzhi Chen, and Quanfang Fan. 2020. Deep neural network based environment sound classification and its implementation on hearing aid app. Measurement 159 (2020), 107790.Google Scholar
Cross Ref
Index Terms
Deep Neural Network Based Noised Asian Speech Enhancement and Its Implementation on a Hearing Aid App
Recommendations
Improving Deep Neural Network Based Speech Enhancement in Low SNR Environments
LVA/ICA 2015: Proceedings of the 12th International Conference on Latent Variable Analysis and Signal Separation - Volume 9237We propose a joint framework combining speech enhancement SE and voice activity detection VAD to increase the speech intelligibility in low signal-noise-ratio SNR environments. Deep Neural Networks DNN have recently been successfully adopted as a ...
Reconstruction-based speech enhancement from robust acoustic features
A method of speech enhancement that reconstructs clean speech from acoustic features.Features estimated by a statistical method incorporating noise and speaker adaptation.Listening tests find enhancement highly effective in reducing background noise. ...
Enhancement of esophageal speech obtained by a voice conversion technique using time dilated Fourier cepstra
This paper presents a novel speaking-aid system for enhancing esophageal speech (ES). The method adopted in this paper aims to improve the quality of esophageal speech using a combination of a voice conversion technique and a time dilation algorithm. In ...






Comments