skip to main content
research-article

NeuralSound: learning-based modal sound synthesis with acoustic transfer

Published:22 July 2022Publication History
Skip Abstract Section

Abstract

We present a novel learning-based modal sound synthesis approach that includes a mixed vibration solver for modal analysis and a radiation network for acoustic transfer. Our mixed vibration solver consists of a 3D sparse convolution network and a Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) module for iterative optimization. Moreover, we highlight the correlation between a standard numerical vibration solver and our network architecture. Our radiation network predicts the Far-Field Acoustic Transfer maps (FFAT Maps) from the surface vibration of the object. The overall running time of our learning-based approach for most new objects is less than one second on a RTX 3080 Ti GPU while maintaining a high sound quality close to the ground truth solved by standard numerical methods. We also evaluate the numerical and perceptual accuracy of our approach on different objects with various shapes and materials.

Skip Supplemental Material Section

Supplemental Material

3528223.3530184.mp4

presentation

121-786-supp-video.mp4

supplemental material

References

  1. Peter Arbenz, Ulrich L Hetmaniuk, Richard B Lehoucq, and Raymond S Tuminaro. 2005. A comparison of eigensolvers for large-scale 3D modal analysis using AMG-preconditioned iterative methods. Internat. J. Numer. Methods Engrg. 64, 2 (2005), 204--236.Google ScholarGoogle ScholarCross RefCross Ref
  2. Timo Betcke and Matthew W Scroggs. 2021. Bempp-cl: A fast Python based just-in-time compiling boundary element library. Journal of Open Source Software 6, 59 (2021), 2879.Google ScholarGoogle ScholarCross RefCross Ref
  3. Nicolas Bonneel, George Drettakis, Nicolas Tsingos, Isabelle Viaud-Delmon, and Doug James. 2008. Fast modal sounds with scalable frequency-domain synthesis. In ACM SIGGRAPH 2008 papers. 1--9.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. William L Briggs, Van Emden Henson, and Steve F McCormick. 2000. A multigrid tutorial. SIAM.Google ScholarGoogle Scholar
  5. Jeffrey N Chadwick, Steven S An, and Doug L James. 2009. Harmonic shells: a practical nonlinear sound model for near-rigid thin shells. ACM Transactions on Graphics (Proceedings of SIGGRAPH 2009) 28, 5 (2009), 1--10.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Jeffrey N. Chadwick, Changxi Zheng, and Doug L. James. 2012. Precomputed Acceleration Noise for Improved Rigid-Body Sound. ACM Transactions on Graphics (Proceedings of SIGGRAPH 2012) 31, 4 (Aug. 2012).Google ScholarGoogle Scholar
  7. Christopher Choy, JunYoung Gwak, and Silvio Savarese. 2019. 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3075--3084.Google ScholarGoogle ScholarCross RefCross Ref
  8. Perry R. Cook. 1995. Integration of Physical Modeling for Synthesis and Animation. In Proceedings of the 1995 International Computer Music Conference, ICMC 1995, Banff, AB, Canada, September 3--7, 1995. Michigan Publishing.Google ScholarGoogle Scholar
  9. Erwin Coumans and Yunfei Bai. 2016. Pybullet, a python module for physics simulation for games, robotics and machine learning. (2016).Google ScholarGoogle Scholar
  10. Lothar Cremer and Manfred Heckl. 2013. Structure-borne sound: structural vibrations and sound radiation at audio frequencies. Springer Science & Business Media.Google ScholarGoogle Scholar
  11. Jed A. Duersch, Meiyue Shao, Chao Yang, and Ming Gu. 2018. A Robust and Efficient Implementation of LOBPCG. SIAM Journal on Scientific Computing 40, 5 (2018), C655--C676.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Benjamin Graham, Martin Engelcke, and Laurens Van Der Maaten. 2018. 3d semantic segmentation with submanifold sparse convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 9224--9232.Google ScholarGoogle ScholarCross RefCross Ref
  13. Benjamin Graham and Laurens van der Maaten. 2017. Submanifold Sparse Convolutional Networks. arXiv preprint arXiv:1706.01307 (2017).Google ScholarGoogle Scholar
  14. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.Google ScholarGoogle ScholarCross RefCross Ref
  15. Doug L. James. 2016. Physically Based Sound for Computer Animation and Virtual Environments. In ACM SIGGRAPH 2016 Courses (Anaheim, California) (SIGGRAPH '16). Association for Computing Machinery, New York, NY, USA, Article 22, 8 pages.Google ScholarGoogle Scholar
  16. Doug L James, Jernej Barbič, and Dinesh K Pai. 2006. Precomputed acoustic transfer: output-sensitive, accurate sound generation for geometrically complex vibration sources. ACM Transactions on Graphics (Proceedings of SIGGRAPH 2006) 25, 3 (2006), 987--995.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Xutong Jin, Sheng Li, Tianshu Qu, Dinesh Manocha, and Guoping Wang. 2020. Deep-Modal: Real-Time Impact Sound Synthesis for Arbitrary Shapes. In Proceedings of the 28th ACM International Conference on Multimedia (Seattle, WA, USA) (MM '20). Association for Computing Machinery, New York, NY, USA, 1171--1179.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Diederik P Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In ICLR (Poster).Google ScholarGoogle Scholar
  19. Stephen Kirkup. 2019. The boundary element method in acoustics: A survey. Applied Sciences 9, 8 (2019), 1642.Google ScholarGoogle ScholarCross RefCross Ref
  20. Andrew Knyazev. 1997. New estimates for Ritz vectors. Mathematics of computation 66, 219 (1997), 985--995.Google ScholarGoogle Scholar
  21. Andrew V Knyazev. 1998. Preconditioned eigensolvers---an oxymoron. Electron. Trans. Numer. Anal 7 (1998), 104--123.Google ScholarGoogle Scholar
  22. Andrew V Knyazev. 2001. Toward the optimal preconditioned eigensolver: Locally optimal block preconditioned conjugate gradient method. SIAM journal on scientific computing 23, 2 (2001), 517--541.Google ScholarGoogle Scholar
  23. Sebastian Koch, Albert Matveev, Zhongshi Jiang, Francis Williams, Alexey Artemov, Evgeny Burnaev, Marc Alexa, Denis Zorin, and Daniele Panozzo. 2019. ABC: A Big CAD Model Dataset For Geometric Deep Learning. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle Scholar
  24. Cornelius Lanczos. 1950. An iteration method for the solution of the eigenvalue problem of linear differential and integral operators. United States Governm. Press Office Los Angeles, CA.Google ScholarGoogle Scholar
  25. Timothy R. Langlois, Steven S. An, Kelvin K. Jin, and Doug L. James. 2014. Eigenmode Compression for Modal Sound Models. ACM Transactions on Graphics (Proceedings of SIGGRAPH 2014) 33, 4 (Aug. 2014).Google ScholarGoogle Scholar
  26. R. B. Lehoucq, D. C. Sorensen, and C. Yang. 1997. ARPACK Users Guide: Solution of Large Scale Eigenvalue Problems by Implicitly Restarted Arnoldi Methods.Google ScholarGoogle Scholar
  27. Dingzeyu Li, Yun Fei, and Changxi Zheng. 2015. Interactive acoustic transfer approximation for modal sound. ACM Transactions on Graphics (Proceedings of SIGGRAPH 2015) 35, 1 (2015), 1--16.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Shiguang Liu and Dinesh Manocha. 2020. Sound Synthesis, Propagation, and Rendering: A Survey. arXiv preprint arXiv:2011.05538 (2020).Google ScholarGoogle Scholar
  29. Yijun Liu. 2009. Fast multipole boundary element method: theory and applications in engineering. Cambridge university press.Google ScholarGoogle Scholar
  30. Ravish Mehra, Nikunj Raghuvanshi, Lakulish Antani, Anish Chandak, Sean Curtis, and Dinesh Manocha. 2013. Wave-based sound propagation in large open scenes using an equivalent source formulation. ACM Transactions on Graphics (Proceedings of SIGGRAPH 2013) 32, 2 (2013), 1--13.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Hsien-Yu Meng, Zhenyu Tang, and Dinesh Manocha. 2021. Point-based Acoustic Scattering for Interactive Sound Propagation via Surface Encoding. CoRR abs/2105.08177 (2021).Google ScholarGoogle Scholar
  32. James F. O'Brien, Chen Shen, and Christine M. Gatchalian. 2002. Synthesizing Sounds from Rigid-Body Simulations. In Proceedings of the 2002 ACM SIGGRAPH/Eurographics Symposium on Computer Animation (San Antonio, Texas) (SCA '02). Association for Computing Machinery, New York, NY, USA, 175--181.Google ScholarGoogle Scholar
  33. Dinesh K Pai, Kees van den Doel, Doug L James, Jochen Lang, John E Lloyd, Joshua L Richmond, and Som H Yau. 2001. Scanning physical interaction behavior of 3D objects. In Proceedings of the 28th annual conference on Computer graphics and interactive techniques. 87--96.Google ScholarGoogle Scholar
  34. Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. 2017. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Advances in neural information processing systems. 5099--5108.Google ScholarGoogle Scholar
  35. Nikunj Raghuvanshi and Ming C Lin. 2006. Interactive sound synthesis for large scale environments. In Proceedings of the 2006 symposium on Interactive 3D graphics and games. 101--108.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Anton Ratnarajah, Shi-Xiong Zhang, Meng Yu, Zhenyu Tang, Dinesh Manocha, and Dong Yu. 2021. FAST-RIR: Fast neural diffuse room impulse response generator. Google ScholarGoogle ScholarCross RefCross Ref
  37. Zhimin Ren, Hengchin Yeh, and Ming C Lin. 2013. Example-guided physically based modal sound synthesis. ACM Transactions on Graphics (Proceedings of SIGGRAPH 2013) 32, 1 (2013), 1--16.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention. Springer, 234--241.Google ScholarGoogle ScholarCross RefCross Ref
  39. Atul Rungta, Carl Schissler, Ravish Mehra, Chris Malloy, Ming Lin, and Dinesh Manocha. 2016. SynCoPation: Interactive synthesis-coupled sound propagation. IEEE transactions on visualization and computer graphics 22, 4 (2016), 1346--1355.Google ScholarGoogle Scholar
  40. Ahmed A Shabana. 1991. Theory of vibration. Vol. 2. Springer.Google ScholarGoogle Scholar
  41. Auston Sterling, Nicholas Rewkowski, Roberta L Klatzky, and Ming C Lin. 2019. Audio-material reconstruction for virtualized reality using a probabilistic damping model. IEEE transactions on visualization and computer graphics 25, 5 (2019), 1855--1864.Google ScholarGoogle ScholarCross RefCross Ref
  42. Hang Su, Subhransu Maji, Evangelos Kalogerakis, and Erik Learned-Miller. 2015. Multi-view convolutional neural networks for 3d shape recognition. In Proceedings of the IEEE international conference on computer vision. 945--953.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Zhenyu Tang, Rohith Aralikatti, Anton Ratnarajah, and Dinesh Manocha. 2022. GWA: A Large High-Quality Acoustic Dataset for Audio Processing. Google ScholarGoogle ScholarCross RefCross Ref
  44. Kees van de Doel and Dinesh K Pai. 1996. Synthesis of shape dependent sounds with physical modeling. Georgia Institute of Technology.Google ScholarGoogle Scholar
  45. Kees van den Doel, Paul G. Kry, and Dinesh K. Pai. 2001. FoleyAutomatic: Physically-Based Sound Effects for Interactive Simulation and Animation (SIGGRAPH '01). Association for Computing Machinery, New York, NY, USA.Google ScholarGoogle Scholar
  46. Jui-Hsien Wang and Doug L. James. 2019. KleinPAT: Optimal Mode Conflation for Time-domain Precomputation of Acoustic Transfer. ACM Transactions on Graphics (Proceedings of SIGGRAPH 2019) 38, 4, Article 122 (July 2019), 12 pages.Google ScholarGoogle Scholar
  47. Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, and Jianxiong Xiao. 2015. 3D shapenets: A deep representation for volumetric shapes. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1912--1920.Google ScholarGoogle Scholar
  48. Tianxiang Zhang, Sheng Li, Dinesh Manocha, Guoping Wang, and Hanqiu Sun. 2015. Quadratic Contact Energy Model for Multi-impact Simulation. In Computer Graphics Forum, Vol. 34. Wiley Online Library, 133--144.Google ScholarGoogle Scholar
  49. Changxi Zheng and Doug L James. 2010. Rigid-body fracture sound with precomputed soundbanks. In ACM SIGGRAPH 2010 papers. 1--13.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Changxi Zheng and Doug L James. 2011. Toward high-quality modal contact sound. In ACM SIGGRAPH 2011 papers. 1--12.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. NeuralSound: learning-based modal sound synthesis with acoustic transfer

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Graphics
        ACM Transactions on Graphics  Volume 41, Issue 4
        July 2022
        1978 pages
        ISSN:0730-0301
        EISSN:1557-7368
        DOI:10.1145/3528223
        Issue’s Table of Contents

        Copyright © 2022 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 22 July 2022
        Published in tog Volume 41, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader