skip to main content
research-article

Deep Neural Network Based Noised Asian Speech Enhancement and Its Implementation on a Hearing Aid App

Authors Info & Claims
Published:21 July 2021Publication History
Skip Abstract Section

Abstract

This article studies noised Asian speech enhancement based on the deep neural network (DNN) and its implementation on an app. We use the THCHS-30 speech dataset and the common noise dataset in daily life as training and testing data of the DNN. To stack the frequency data of multiple audio frames to improve the effect of speech enhancement, the system compares the best number of stacked frames during training and testing. At the same time, the influence of training rounds on the PESQ is compared, and the best number of rounds is obtained. On this basis, the best model is implemented on the hearing aid app, and the real-time performance of the device is tested. The experiment shows that based on the DNN, using an appropriate number of rounds for training and using an appropriate number of audio frames stacking to improve the speech enhancement effect, and transplanting this speech enhancement model to the hearing aid app, can effectively improve speech clarity and intelligibility within a reasonable time delay range.

References

  1. Xiaoling Ma, Xun Liu, Sixing Zhang, Mengkang Zhang, Xiushan Cao, Liuming Tian, and Wen Gao. 2014. The development and prospect of hearing aids. Journal of Minzu University of China (Natural Sciences Edition).Google ScholarGoogle Scholar
  2. A. W. Rix, J. G. Beerends, M. P. Hollier, and A. P. Hekstra. 2001. Perceptual evaluation of speech quality (PESQ)—A new method for speech quality assessment of telephone networks and codecs. In Proceedings of the 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. IEEE, Los Alamitos, CA, 749–752. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Boll. 1979. Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing 27, 2 (1979), 113–120.Google ScholarGoogle ScholarCross RefCross Ref
  4. J. S. Lim and A. V. Oppenheim. 1978. All-pole modeling of degraded speech. IEEE Transactions on Acoustics Speech, and Signal Processing 26, 3 (1978), 197–210.Google ScholarGoogle ScholarCross RefCross Ref
  5. J. Chen, J. Benesty, Y. Huang, and S. Doclo. 2006. New insights into the noise reduction Wiener filter. IEEE Transactions on Acoustics, Speech, and Signal Processing 14, 4 (2006), 1218–1234. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Y. Ephraim and D. Malah. 1984. Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing 32, 6 (1984), 1109–1121.Google ScholarGoogle ScholarCross RefCross Ref
  7. I. Cohen and B. Berdugo. 2001. Speech enhancement for nonstationary noise environments. Signal Processing 81, 11 (2001), 2403–2418.Google ScholarGoogle ScholarCross RefCross Ref
  8. I. Cohen. 2003. Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging. IEEE Transactions on Speech and Audio Processing 11, 5 (2003), 466–475.Google ScholarGoogle ScholarCross RefCross Ref
  9. S. I. Tamura. 1989. An analysis of a noise reduction neural network. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing. 2001–2004.Google ScholarGoogle ScholarCross RefCross Ref
  10. F. Xie and D. V. Compernolle. 1994. A family of MLP based nonlinear spectral estimators for noise reduction. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing. 53–56.Google ScholarGoogle Scholar
  11. G. E. Hinton, S. Osindero, and Y. W. Teh. 2006. A fast learning algorithm for deep belief nets. Neural Computation 18, 7 (2006), 1527–1554. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. G. E. Hinton and R. R. Salakhutdinov. 2006. Reducing the dimensionality of data with neural networks. Science 313, 5786 (2006), 504–507.Google ScholarGoogle Scholar
  13. B.-Y. Xia and C.-C. Bao. 2013. Speech enhancement with weighted denoising auto-encoder. In Proceedings of the 2013 INTERSPEECH Conference. 3444–3448.Google ScholarGoogle Scholar
  14. X.-G. Lu, Y. Tsao, S. Matsuda, and C. Hori. 2013. Speech enhancement based on deep denoising autoencoder. In Proceedings of the 2013 INTERSPEECH Conference. 436–440.Google ScholarGoogle Scholar
  15. Y. Xu, J. Du, L. R. Dai, and C. H. Lee. 2015. A regression approach to speech enhancement based on deep neural networks. IEEE/ACM Transactions on Audio, Speech, and Language Processing 23, 1 (2015), 7–19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. H. Larochelle, Y. Bengio, J. Louradour, and P. Lamblin. 2009. Exploring strategies for training deep neural networks. Journal of Machine Learning Research 10 (2009), 1–40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Y. Bengio. 2009. Learning deep architectures for AI. Foundations and Trends in Machine Learning 2, 1 (2009), 1–127. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Tamura. 1989. An analysis of a noise reduction neural network. In Proceedings of the 1989 International Conference on Acoustics, Speech, and Signal Processing (ICASSP’89).Google ScholarGoogle ScholarCross RefCross Ref
  19. G. E. Hinton, S. Osindero, and Y. W. Teh. 2006. A fast learning algorithm for deep belief nets. Neural Computation 18, 7 (2006), 1527–1554. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Y. X. Wang and D. L. Wang. 2013. Towards scaling up classification-based speech separation. IEEE Transactions on Audio, Speech, and Language Processing 21, 7 (2013), 1381–1390. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Y. X. Wang, A. Narayanan, and D. L. Wang. 2014. On training targets for supervised speech separation. IEEE Transactions on Audio, Speech, and Language Processing 22, 12 (2014), 1849–1858. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. F. Weninger, H. Erdogan, S. Watanabe, E. Vincent, J. Le Roux, J. R. Hershey, and B. Schuller. 2015. Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR. In Proceedings of the International Conference on Latent Variable Analysis and Signal Separation. 91–99. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Serim Park and Jin Lee. 2017. A fully convolutional neural network for speech enhancement. arXiv:1609.07132.Google ScholarGoogle Scholar
  24. Y. Xu, J. Du, L.-R. Dai, and C.-H. Lee. 2014. An experimental study on speech enhancement based on deep neural networks. IEEE Signal Processing Letters 21, 1 (2014), 65–68.Google ScholarGoogle ScholarCross RefCross Ref
  25. D. S. Williamson, Y. X. Wang, and D. L. Wang. 2016. Complex ratio masking for monaural speech separation. IEEE/ACM Transactions on Audio, Speech, and Language Processing 24, 3 (2016), 483–492. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Jingxian Tu and Youshen Xia. 2018. Effective Kalman filtering algorithm for distributed multichannel speech enhancement. Neurocomputing 275, (2018), 144–154.Google ScholarGoogle Scholar
  27. R. K. Kandagatla and P. V. Subbaiah. 2018. Speech enhancement using MMSE estimation of amplitude and complex speech spectral coefficients under phase-uncertainty. Speech Communication 96 (2018), 10–27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish Narayanan. 2015. Deep learning with limited numerical precision. Proceedings of Machine Learning Research 37 (2015), 1737–1746. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. A. Horowitz, and W. J. Dally. 2016. EIE: Efficient inference engine on compressed deep neural network. ACM SIGARCH Computer Architecture News 44, 3 (2016), 243–254. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. N. D. Lane, S. Bhattacharya, Petko Georgiev, C. Forliveski, L. Jiao, L. Qendro, and F. Kawsar. 2016. DeepX: A software accelerator for low-power deep learning inference on mobile devices. In Proceedings of the 2016 15th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN’16). Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Z. Yermeche, B. Sallberg, N. Grbic, and I. Claesson. 2007. Real-time DSP implementation of a subband beamforming algorithm for dual microphone speech enhancement. In Proceedings of the 2007 IEEE International Symposium on Circuits and Systems. IEEE, Los Alamitos, CA.Google ScholarGoogle Scholar
  32. J. M. Valin. 2017. A hybrid DSP/deep learning approach to real-time full-band speech enhancement. arXiv:1709.08243.Google ScholarGoogle Scholar
  33. Xiaoqian Fan, Tianyi Sun, Wenzhi Chen, and Quanfang Fan. 2020. Deep neural network based environment sound classification and its implementation on hearing aid app. Measurement 159 (2020), 107790.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Deep Neural Network Based Noised Asian Speech Enhancement and Its Implementation on a Hearing Aid App

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Article Metrics

      • Downloads (Last 12 months)24
      • Downloads (Last 6 weeks)0

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!