skip to main content
research-article

CNN-based Robust Sound Source Localization with SRP-PHAT for the Extreme Edge

Published:19 April 2023Publication History
Skip Abstract Section

Abstract

Robust sound source localization for environments with noise and reverberation are increasingly exploiting deep neural networks fed with various acoustic features. Yet, state-of-the-art research mainly focuses on optimizing algorithmic accuracy, resulting in huge models preventing edge-device deployment. The edge, however, urges for real-time low-footprint acoustic reasoning for applications such as hearing aids and robot interactions. Hence, we set off from a robust CNN-based model using SRP-PHAT features, Cross3D [16], to pursue an efficient yet compact model architecture for the extreme edge. For both the SRP feature representation and neural network, we propose respectively our scalable LC-SRP-Edge and Cross3D-Edge algorithms which are optimized towards lower hardware overhead. LC-SRP-Edge halves the complexity and on-chip memory overhead for the sinc interpolation compared to the original LC-SRP [19]. Over multiple SRP resolution cases, Cross3D-Edge saves 10.32%~73.71% computational complexity and 59.77%~94.66% neural network weights against the Cross3D baseline. In terms of the accuracy-efficiency tradeoff, the most balanced version (EM) requires only 127.1 MFLOPS computation, 3.71 MByte/s bandwidth, and 0.821 MByte on-chip memory in total, while still retaining competitiveness in state-of-the-art accuracy comparisons. It achieves 8.59 ms/frame end-to-end latency on a Rasberry Pi 4B, which is 7.26× faster than the corresponding baseline.

REFERENCES

  1. [1] Adavanne Sharath, Politis Archontis, Nikunen Joonas, and Virtanen Tuomas. 2018. Sound event localization and detection of overlapping sources using convolutional recurrent neural networks. IEEE Journal of Selected Topics in Signal Processing 13, 1 (2018), 3448.Google ScholarGoogle Scholar
  2. [2] Adavanne Sharath, Politis Archontis, and Virtanen Tuomas. 2018. Direction of arrival estimation for multiple sound sources using convolutional recurrent neural network. In Proceedings of the 2018 26th European Signal Processing Conference. IEEE, 14621466.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Adavanne Sharath, Politis Archontis, and Virtanen Tuomas. 2019. Localization, detection and tracking of multiple moving sound sources with a convolutional recurrent neural network. In Workshop on Detection and Classification of Acoustic Scenes and Events.Google ScholarGoogle Scholar
  4. [4] Adavanne Sharath, Politis Archontis, and Virtanen Tuomas. 2019. A multi-room reverberant dataset for sound event localization and detection. In Workshop on Detection and Classification of Acoustic Scenes and Events.Google ScholarGoogle Scholar
  5. [5] Allen Jont B. and Berkley David A.. 1979. Image method for efficiently simulating small-room acoustics. The Journal of the Acoustical Society of America 65, 4 (1979), 943950.Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Cao Yin, Iqbal Turab, Kong Qiuqiang, An Fengyan, Wang Wenwu, and Plumbley Mark D.. 2021. An improved event-independent network for polyphonic sound event localization and detection. In Proceedings of the ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 885889.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Chakrabarty Soumitro and Habets Emanuël A. P.. 2017. Broadband DOA estimation using convolutional neural networks trained with noise signals. In Proceedings of the 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. IEEE, 136140.Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Chakrabarty Soumitro and Habets Emanuël A. P.. 2017. Multi-speaker localization using convolutional neural network trained with noise. arXiv:1712.04276 [cs.SD].Google ScholarGoogle Scholar
  9. [9] Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Q. Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. TVM: An automated end-to-end optimizing compiler for deep learning. In USENIX Symposium on Operating Systems Design and Implementation.Google ScholarGoogle Scholar
  10. [10] Chiariotti Paolo, Martarelli Milena, and Castellini Paolo. 2019. Acoustic beamforming for noise source localization–Reviews, methodology and applications. Mechanical Systems and Signal Processing 120 (2019), 422448. https://www.sciencedirect.com/science/article/abs/pii/S088832701830637X.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Kyunghyun Cho, Bart van Merrienboer, Çaglar Gülçehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder–decoder for statistical machine translation. In Conference on Empirical Methods in Natural Language Processing.Google ScholarGoogle Scholar
  12. [12] Chollet François. 2017. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 12511258.Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Comanducci Luca, Borra Federico, Bestagini Paolo, Antonacci Fabio, Tubaro Stefano, and Sarti Augusto. 2020. Source localization using distributed microphones in reverberant environments based on deep learning and ray space transform. IEEE/ACM Transactions on Audio, Speech, and Language Processing 28 (2020), 22382251. https://ieeexplore.ieee.org/abstract/document/9146703.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] Dávila-Chacón Jorge, Liu Jindong, and Wermter Stefan. 2018. Enhanced robot speech recognition using biomimetic binaural sound source localization. IEEE Transactions on Neural Networks and Learning Systems 30, 1 (2018), 138150.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Diaz-Guerra David. 2020. Cross3D Codebase. Retrieved from https://github.com/DavidDiazGuerra/Cross3D.Google ScholarGoogle Scholar
  16. [16] Diaz-Guerra David, Miguel Antonio, and Beltran Jose R.. 2020. Robust sound source tracking using SRP-PHAT and 3D convolutional neural networks. IEEE/ACM Transactions on Audio, Speech, and Language Processing 29 (2020), 300311. https://ieeexplore.ieee.org/abstract/document/9268154.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. [17] Diaz-Guerra David, Miguel Antonio, and Beltran Jose R.. 2021. gpuRIR: A python library for room impulse response simulation with GPU acceleration. Multimedia Tools and Applications 80, 4 (2021), 56535671.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] DiBiase Joseph H., Silverman Harvey F., and Brandstein Michael S.. 2001. Robust localization in reverberant rooms. In Proceedings of the Microphone Arrays. Springer, 157180.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Dietzen Thomas, Sena Enzo De, and Waterschoot Toon van. 2020. Low-complexity steered response power mapping based on Nyquist-Shannon sampling. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA’21), 206–210.Google ScholarGoogle Scholar
  20. [20] Dmochowski Jacek P., Benesty Jacob, and Affes Sofiene. 2007. A generalized steered response power method for computationally viable source localization. IEEE Transactions on Audio, Speech, and Language Processing 15, 8 (2007), 25102526.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. [21] Do Hoang, Silverman Harvey F., and Yu Ying. 2007. A real-time SRP-PHAT source location implementation using stochastic region contraction (SRC) on a large-aperture microphone array. In Proceedings of the 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP’07. Vol. 1. IEEE, I–121.Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Evers Christine, Löllmann Heinrich W., Mellmann Heinrich, Schmidt Alexander, Barfuss Hendrik, Naylor Patrick A., and Kellermann Walter. 2020. The LOCATA challenge: Acoustic source localization and tracking. IEEE/ACM Transactions on Audio, Speech, and Language Processing 28 (2020), 16201643. https://ieeexplore.ieee.org/abstract/document/9079214.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Grumiaux Pierre-Amaury, Kitic Srdan, Girin Laurent, and Guérin Alexandre. 2021. Improved feature extraction for CRNN-based multiple sound source localization. In 29th European Signal Processing Conference (EUSIPCO’21). 231–235.Google ScholarGoogle Scholar
  24. [24] Grumiaux Pierre-Amaury, Kitić Srđan, Girin Laurent, and Guérin Alexandre. 2021. A survey of sound source localization with deep learning methods. The Journal of the Acoustical Society of America 152, 1 (2021), 107.Google ScholarGoogle Scholar
  25. [25] Guirguis Karim, Schorn Christoph, Guntoro Andre, Abdulatif Sherif, and Yang Bin. 2021. SELD-TCN: Sound event localization & detection via temporal convolutional networks. In Proceedings of the 2020 28th European Signal Processing Conference. IEEE, 1620.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2015. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE International Conference on Computer Vision. 10261034.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] Hirvonen Toni. 2015. Classification of spatial audio location and content using convolutional neural networks. In Proceedings of the Audio Engineering Society Convention 138. Audio Engineering Society.Google ScholarGoogle Scholar
  28. [28] Hochreiter Sepp and Schmidhuber Jürgen. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 17351780.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. [29] Hoshiba Kotaro, Washizaki Kai, Wakabayashi Mizuho, Ishiki Takahiro, Kumon Makoto, Bando Yoshiaki, Gabriel Daniel, Nakadai Kazuhiro, and Okuno Hiroshi G.. 2017. Design of UAV-embedded microphone array system for sound source localization in outdoor environments. Sensors 17, 11 (2017), 2535.Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Huang Yankun, Wu Xihong, and Qu Tianshu. 2020. A time-domain unsupervised learning based sound source localization method. In Proceedings of the 2020 IEEE 3rd International Conference on Information Communication and Signal Processing. IEEE, 2632.Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Jarrett Daniel P., Habets Emanuël A. P., and Naylor Patrick A.. 2017. Theory and Applications of Spherical Microphone Array Processing. Vol. 9. Springer.Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Jee Wen Jie, Mars R., Pratik P., Nagisetty S., and Lim C. S.. 2019. Sound Event Localization and Detection using Convolutional Recurrent Neural Network. Technical Report. DCASE2019 Challenge, Tech. Rep.Google ScholarGoogle Scholar
  33. [33] Kapka Sławomir and Lewandowski Mateusz. 2019. Sound source detection, localization and classification using consecutive ensemble of CRNN models. ArXiv abs/1908.00766 (2019).Google ScholarGoogle Scholar
  34. [34] Kim Youngwook and Ling Hao. 2011. Direction of arrival estimation of humans with a small sensor array using an artificial neural network. Progress In Electromagnetics Research B 27 (2011), 127149.Google ScholarGoogle ScholarCross RefCross Ref
  35. [35] Kingma Diederik P. and Ba Jimmy. 2014. Adam: A method for stochastic optimization. CoRR abs/1412.6980 (2014).Google ScholarGoogle Scholar
  36. [36] Knapp Charles and Carter Glifford. 1976. The generalized correlation method for estimation of time delay. IEEE Transactions on Acoustics, Speech, and Signal Processing 24, 4 (1976), 320327.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Kong Qiuqiang, Cao Yin, Iqbal Turab, Xu Yong, Wang Wenwu, and Plumbley Mark D.. 2019. Cross-task learning for audio tagging, sound event detection and spatial localization: DCASE 2019 baseline systems. arXiv:1904.03476 [cs.SD].Google ScholarGoogle Scholar
  38. [38] Kundu Tribikram. 2014. Acoustic source localization. Ultrasonics 54, 1 (2014), 2538.Google ScholarGoogle Scholar
  39. [39] Kundu Tribikram, Nakatani Hayato, and Takeda Nobuo. 2012. Acoustic source localization in anisotropic plates. Ultrasonics 52, 6 (2012), 740746.Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Moing Guillaume Le, Vinayavekhin Phongtharin, Inoue Tadanobu, Vongkulbhisal Jayakorn, Munawar Asim, Tachibana Ryuki, and Agravante Don Joven. 2019. Learning multiple sound source 2d localization. In Proceedings of the 2019 IEEE 21st International Workshop on Multimedia Signal Processing. IEEE, 16.Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] LeCun Yann, Bengio Yoshua, and Hinton Geoffrey. 2015. Deep learning. Nature 521, 7553 (2015), 436444.Google ScholarGoogle Scholar
  42. [42] Li Qinglong, Zhang Xueliang, and Li Hao. 2018. Online direction of arrival estimation based on deep learning. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 26162620.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. [43] Markus V. S. Lima, Wallace A. Martins, Leonardo O. Nunes, Luiz W. P. Biscainho, Tadeu N. Ferreira, Mauricio V. M. Costa, and Bowon Lee. 2015. A volumetric SRP with refinement step for sound source localization. IEEE Signal Processing Letters 22, 8 (2015), 1098–1102.Google ScholarGoogle Scholar
  44. [44] Loshchilov Ilya and Hutter Frank. 2019. Decoupled Weight Decay Regularization. arXiv:1711.05101 [cs.LG].Google ScholarGoogle Scholar
  45. [45] Marks Robert J II. 2012. Introduction to Shannon Sampling and Interpolation Theory. Springer Science & Business Media.Google ScholarGoogle Scholar
  46. [46] McFee Brian, Raffel Colin, Liang Dawen, Ellis Daniel P. W., McVicar Matt, Battenberg Eric, and Nieto Oriol. 2015. librosa: Audio and music signal analysis in python. In Proceedings of the 14th Python in Science Conference. Vol. 8. Citeseer, 1825.Google ScholarGoogle ScholarCross RefCross Ref
  47. [47] Minotto Vicente Peruffo, Jung Claudio Rosito, Jr Luiz Gonzaga da Silveira, and Lee Bowon. 2013. GPU-based approaches for real-time sound source localization using the SRP-PHAT algorithm. The International Journal of High Performance Computing Applications 27, 3 (2013), 291306.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. [48] Javier Naranjo-Alcazar, Sergi Perez-Castanos, Jose Ferrandis, Pedro Zuccarello, and Maximo Cobos. 2021. Sound Event Localization and Detection using Squeeze-Excitation Residual CNNs. arXiv:2006.14436 [cs.SD].Google ScholarGoogle Scholar
  49. [49] Niu Haiqiang, Reeves Emma, and Gerstoft Peter. 2017. Source localization in an ocean waveguide using supervised machine learning. The Journal of the Acoustical Society of America 142, 3 (2017), 11761188.Google ScholarGoogle ScholarCross RefCross Ref
  50. [50] Noh Kyoungjin, Jeong-Hwan C., Dongyeop J., and Joon-Hyuk C.. 2019. Three-stage approach for sound event localization and detection. Tech. Report of Detection and Classification of Acoustic Scenes and Events 2019 (DCASE) Challange (2019). https://www.semanticscholar.org/paper/THREE-STAGE-APPROACH-FOR-SOUND-EVENT-LOCALIZATION-Noh-Choi/2e0962d0fc80a5b069a09716b35e4fa1ecdb97b1.Google ScholarGoogle Scholar
  51. [51] Panayotov Vassil, Chen Guoguo, Povey Daniel, and Khudanpur Sanjeev. 2015. Librispeech: An asr corpus based on public domain audio books. In Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 52065210.Google ScholarGoogle ScholarCross RefCross Ref
  52. [52] Perotin Lauréline, Serizel Romain, Vincent Emmanuel, and Guérin Alexandre. 2018. CRNN-based joint azimuth and elevation localization with the Ambisonics intensity vector. In Proceedings of the 2018 16th International Workshop on Acoustic Signal Enhancement. IEEE, 241245.Google ScholarGoogle ScholarCross RefCross Ref
  53. [53] Pertilä Pasi and Cakir Emre. 2017. Robust direction estimation with convolutional neural networks based steered response power. In Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 61256129.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. [54] Politis Archontis, Adavanne Sharath, and Virtanen Tuomas. 2020. A dataset of reverberant spatial sound scenes with moving sources for sound event localization and detection. arXiv:2006.01919 [eess.AS].Google ScholarGoogle Scholar
  55. [55] Poschadel Nils, Hupke Robert, Preihs Stephan, and Peissig Jürgen. 2021. Direction of arrival estimation of noisy speech using convolutional recurrent neural networks with higher-order ambisonics signals. In 29th European Signal Processing Conference (EUSIPCO’21), 211–215.Google ScholarGoogle Scholar
  56. [56] Pujol Hadrien, Bavu Eric, and Garcia Alexandre. 2019. Source localization in reverberant rooms using Deep Learning and microphone arrays. In Proceedings of the 23rd International Congress on Acoustics.Google ScholarGoogle Scholar
  57. [57] Pujol Hadrien, Bavu Eric, and Garcia Alexandre. 2021. BeamLearning: An end-to-end Deep Learning approach for the angular localization of sound sources using raw multichannel acoustic pressure data. The Journal of the Acoustical Society of America 149, 6 (2021), 42484263.Google ScholarGoogle ScholarCross RefCross Ref
  58. [58] Rascon Caleb and Meza Ivan. 2017. Localization of sound sources in robotics: A review. Robotics and Autonomous Systems 96 (2017), 184210. https://www.sciencedirect.com/science/article/pii/S0921889016304742.Google ScholarGoogle ScholarCross RefCross Ref
  59. [59] Rickard Scott and Yilmaz Ozgiir. 2002. On the approximate W-disjoint orthogonality of speech. In Proceedings of the 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing. Vol. 1. IEEE, I–529.Google ScholarGoogle Scholar
  60. [60] Roden Reinhild, Moritz Niko, Gerlach Stephan, Weinzierl Stefan, and Goetze Stefan. 2015. On Sound Source Localization of Speech Signals using Deep Neural Networks. https://www.semanticscholar.org/paper/On-sound-source-localization-of-speech-signals-deep-Roden-Moritz/cbbcd9214f1d25aaf4cae3cddbf0d9712056e837.Google ScholarGoogle Scholar
  61. [61] Roy Richard and Kailath Thomas. 1989. ESPRIT-estimation of signal parameters via rotational invariance techniques. IEEE Transactions on Acoustics, Speech, and Signal Processing 37, 7 (1989), 984995.Google ScholarGoogle ScholarCross RefCross Ref
  62. [62] Salvati Daniele, Drioli Carlo, and Foresti Gian Luca. 2018. Exploiting CNNs for improving acoustic source localization in noisy and reverberant conditions. IEEE Transactions on Emerging Topics in Computational Intelligence 2, 2 (2018), 103116.Google ScholarGoogle ScholarCross RefCross Ref
  63. [63] Sawada Hiroshi, Mukai Ryo, and Makino Shoji. 2003. Direction of arrival estimation for multiple source signals using independent component analysis. In Proceedings of the 7th International Symposium on Signal Processing and Its Applications. Vol. 2. IEEE, 411414.Google ScholarGoogle ScholarCross RefCross Ref
  64. [64] Schmidt Ralph. 1986. Multiple emitter location and signal parameter estimation. IEEE Transactions on Antennas and Propagation 34, 3 (1986), 276280.Google ScholarGoogle ScholarCross RefCross Ref
  65. [65] Christopher Schymura, Benedikt T. Bönninghoff, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Tomohiro Nakatani, Shoko Araki, and Dorothea Kolossa. 2021. PILOT: Introducing transformers for probabilistic sound event localization. In Interspeech.Google ScholarGoogle Scholar
  66. [66] Shimada Kazuki, Koyama Yuichiro, Takahashi Naoya, Takahashi Shusuke, and Mitsufuji Yuki. 2021. Accdoa: Activity-coupled cartesian direction of arrival representation for sound event localization and detection. In Proceedings of the ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 915919.Google ScholarGoogle ScholarCross RefCross Ref
  67. [67] Shimada Kazuki, Takahashi Naoya, Takahashi Shusuke, and Mitsufuji Yuki. 2020. Sound event localization and detection using activity-coupled Cartesian DOA vector and RD3Net. arXiv:2006.12014 [eess.AS].Google ScholarGoogle Scholar
  68. [68] Sivasankaran Sunit, Vincent Emmanuel, and Fohr Dominique. 2018. Keyword-based speaker localization: Localizing a target speaker in a multi-speaker environment. In Proceedings of the Interspeech 2018-19th Annual Conference of the International Speech Communication Association.Google ScholarGoogle ScholarCross RefCross Ref
  69. [69] Subramanian Aswin Shanmugam, Weng Chao, Watanabe Shinji, Yu Meng, and Yu Dong. 2021. Deep learning based multi-source localization with source splitting and its effectiveness in multi-talker speech recognition. Comput. Speech Lang. 75 (2021), 101360.Google ScholarGoogle Scholar
  70. [70] Suvorov Dmitry, Dong Ge, and Zhukov Roman. 2018. Deep residual network for sound source localization in the time domain. arXiv:1808.06429 [cs.SD].Google ScholarGoogle Scholar
  71. [71] Tervo Sakari and Lokki Tapio. 2008. Interpolation methods for the SRP-PHAT algorithm. In Proceedings of the 11th International Workshop on Acoustic Echo and Noise Control.1417.Google ScholarGoogle Scholar
  72. [72] Thuillier Etienne, Gamper Hannes, and Tashev Ivan J.. 2018. Spatial audio feature discovery with convolutional neural networks. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 67976801.Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. [73] Trifa Vlad M., Koene Ansgar, Morén Jan, and Cheng Gordon. 2007. Real-time acoustic source localization in noisy environments for human-robot multimodal interaction. In Proceedings of the RO-MAN 2007-The 16th IEEE International Symposium on Robot and Human Interactive Communication. IEEE, 393398.Google ScholarGoogle ScholarCross RefCross Ref
  74. [74] Tsuzuki Hirofumi, Kugler Mauricio, Kuroyanagi Susumu, and Iwata Akira. 2013. An approach for sound source localization by complex-valued neural network. IEICE Transactions on Information and Systems 96, 10 (2013), 22572265.Google ScholarGoogle ScholarCross RefCross Ref
  75. [75] Bogaert Tim Van den, Carette Evelyne, and Wouters Jan. 2011. Sound source localization using hearing aids with microphones placed behind-the-ear, in-the-canal, and in-the-pinna. International Journal of Audiology 50, 3 (2011), 164176.Google ScholarGoogle ScholarCross RefCross Ref
  76. [76] Varanasi Vishnuvardhan, Gupta Harshit, and Hegde Rajesh M.. 2020. A deep learning framework for robust DOA estimation using spherical harmonic decomposition. IEEE/ACM Transactions on Audio, Speech, and Language Processing 28 (2020), 12481259. https://ieeexplore.ieee.org/abstract/document/9056464.Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. [77] Varzandeh Reza, Adiloğlu Kamil, Doclo Simon, and Hohmann Volker. 2020. Exploiting periodicity features for joint detection and DOA estimation of speech sources using convolutional neural networks. In Proceedings of the ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 566570.Google ScholarGoogle ScholarCross RefCross Ref
  78. [78] Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N., Kaiser Łukasz, and Polosukhin Illia. 2017. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems. 59986008.Google ScholarGoogle Scholar
  79. [79] Vecchiotti Paolo, Principi Emanuele, Squartini Stefano, and Piazza Francesco. 2018. Deep neural networks for joint voice activity detection and speaker localization. In Proceedings of the 2018 26th European Signal Processing Conference. IEEE, 15671571.Google ScholarGoogle ScholarCross RefCross Ref
  80. [80] Vera-Diaz Juan Manuel, Pizarro Daniel, and Macias-Guarasa Javier. 2018. Towards end-to-end acoustic localization using deep learning: From audio signals to source position coordinates. Sensors 18, 10 (2018), 3418.Google ScholarGoogle ScholarCross RefCross Ref
  81. [81] Vincent Emmanuel, Virtanen Tuomas, and Gannot Sharon. 2018. Audio Source Separation and Speech Enhancement. John Wiley & Sons.Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. [82] Qing Wang, Jun Du, Hua-Xin Wu, Jia Pan, Feng Ma, and Chin-Hui Lee. 2023. A four-stage data augmentation approach to ResNet-Conformer based acoustic modeling for sound event localization and detection. arXiv:2101.02919 [cs.SD].Google ScholarGoogle Scholar
  83. [83] Wang Qing, Wu Huaxin, Jing Zijun, Ma Feng, Fang Yi, Wang Yuxuan, Chen Tairan, Pan Jia, Du Jun, and Lee Chin-Hui. 2020. The USTC-IFLYTEK system for sound event localization and detection of DCASE2020 challenge. Tech. Rep., DCASE2020 Challenge (2020). https://www.semanticscholar.org/paper/THE-USTC-IFLYTEK-SYSTEM-FOR-SOUND-EVENT-AND-OF-Wang-Wu/735990cac7c3791725ac4c846ac61a603409d66b.Google ScholarGoogle Scholar
  84. [84] Wang Zhong-Qiu, Zhang Xueliang, and Wang DeLiang. 2018. Robust speaker localization guided by deep learning-based time-frequency masking. IEEE/ACM Transactions on Audio, Speech, and Language Processing 27, 1 (2018), 178188.Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. [85] Williams Samuel, Waterman Andrew, and Patterson David. 2009. Roofline: An insightful visual performance model for multicore architectures. Communications of the ACM 52, 4 (2009), 6576.Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. [86] Wu Yifan, Ayyalasomayajula Roshan, Bianco Michael J., Bharadia Dinesh, and Gerstoft Peter. 2021. SSLIDE: Sound source localization for indoors based on deep learning. In Proceedings of the ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 46804684.Google ScholarGoogle ScholarCross RefCross Ref
  87. [87] Xenaki Angeliki, Boldt Jesper Bünsow, and Christensen Mads Græsbøll. 2018. Sound source localization and speech enhancement with sparse Bayesian learning beamforming. The Journal of the Acoustical Society of America 143, 6 (2018), 39123921.Google ScholarGoogle ScholarCross RefCross Ref
  88. [88] Xiao Xiong, Zhao Shengkui, Zhong Xionghu, Jones Douglas L., Chng Eng Siong, and Li Haizhou. 2015. A learning-based approach to direction of arrival estimation in noisy and reverberant environments. In Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 28142818.Google ScholarGoogle ScholarCross RefCross Ref
  89. [89] Xu Bin, Sun Guodong, Yu Ran, and Yang Zheng. 2012. High-accuracy TDOA-based localization without time synchronization. IEEE Transactions on Parallel and Distributed Systems 24, 8 (2012), 15671576.Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. [90] Yalta Nelson, Nakadai Kazuhiro, and Ogata Tetsuya. 2017. Sound source localization using deep learning models. Journal of Robotics and Mechatronics 29, 1 (2017), 3748.Google ScholarGoogle ScholarCross RefCross Ref
  91. [91] Yasuda Masahiro, Koizumi Yuma, Saito Shoichiro, Uematsu Hisashi, and Imoto Keisuke. 2020. Sound event localization based on sound intensity vector refined by DNN-based denoising and source separation. In Proceedings of the ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 651655.Google ScholarGoogle ScholarCross RefCross Ref
  92. [92] Youssef Karim, Argentieri Sylvain, and Zarader Jean-Luc. 2013. A learning-based approach to robust binaural sound localization. In Proceedings of the 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 29272932.Google ScholarGoogle ScholarCross RefCross Ref
  93. [93] Zhang Wangyou, Zhou Ying, and Qian Yanmin. 2019. Robust DOA estimation based on convolutional neural network and time-frequency masking. In Proceedings of the INTERSPEECH. 27032707.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. CNN-based Robust Sound Source Localization with SRP-PHAT for the Extreme Edge

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Embedded Computing Systems
        ACM Transactions on Embedded Computing Systems  Volume 22, Issue 3
        May 2023
        546 pages
        ISSN:1539-9087
        EISSN:1558-3465
        DOI:10.1145/3592782
        • Editor:
        • Tulika Mitra
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 19 April 2023
        • Online AM: 6 March 2023
        • Accepted: 12 February 2023
        • Revised: 18 November 2022
        • Received: 17 February 2022
        Published in tecs Volume 22, Issue 3

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
      • Article Metrics

        • Downloads (Last 12 months)203
        • Downloads (Last 6 weeks)30

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      View Full Text

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!