skip to main content
research-article

Efficient acoustic perception for virtual AI agents

Published:27 September 2021Publication History
Skip Abstract Section

Abstract

We model acoustic perception in AI agents efficiently within complex scenes with many sound events. The key idea is to employ perceptual parameters that capture how each sound event propagates through the scene to the agent's location. This naturally conforms virtual perception to human. We propose a simplified auditory masking model that limits localization capability in the presence of distracting sounds. We show that anisotropic reflections as well as the initial sound serve as useful localization cues. Our system is simple, fast, and modular and obtains natural results in our tests, letting agents navigate through passageways and portals by sound alone, and anticipate or track occluded but audible targets. Source code is provided.

Skip Supplemental Material Section

Supplemental Material

References

  1. Jens Blauert. 1996. Spatial Hearing: The Psychophysics of Human Sound Localization.Google ScholarGoogle ScholarCross RefCross Ref
  2. Chang'an Chen, Unnat Jain, Carl Schissler, S. V. A. Garí, Ziad Al-Halah, Vamsi K. Ithapu, Philip Robinson, and K. Grauman. 2020. SoundSpaces: Audio-Visual Navigation in 3D Environments. In ECCV. https://doi.org/10.1007/978-3-030-58539-6_2Google ScholarGoogle Scholar
  3. E. Colin Cherry. 1953. Some Experiments on the Recognition of Speech, with One and with Two Ears. The Journal of the Acoustical Society of America 25, 5 (Sept. 1953), 975--979. https://doi.org/10.1121/1.1907229Google ScholarGoogle ScholarCross RefCross Ref
  4. Brent Cowan, Bill Kapralos, and K. C. Collins. 2020. Realistic Auditory Artificial Intelligence: Spatial Sound Modelling to Provide NPCs with Sound Perception. In Audio Engineering Society Conference: 2020 AES International Conference on Audio for Virtual and Augmented Reality. Audio Engineering Society.Google ScholarGoogle Scholar
  5. Sam Devlin, Raluca Georgescu, Ida Momennejad, Jaroslaw Rzepecki, Evelyn Zuniga, Gavin Costello, Guy Leroy, Ali Shaw, and Katja Hofmann. 2021. Navigation Turing Test (NTT): Learning to Evaluate Human-like Navigation. In 2021 International Conference on Machine Learning.Google ScholarGoogle Scholar
  6. Anders Gade. 2007. Acoustics in Halls for Speech and Music. In Springer Handbook of Acoustics (two thousand, seventh ed.), Thomas Rossing (Ed.). Springer, Chapter 9.Google ScholarGoogle Scholar
  7. Pengfei Huang, Mubbasir Kapadia, and Norman I. Badler. 2013. SPREAD: Sound Propagation and Perception for Autonomous Agents in Dynamic Environments. In Proceedings of the 12th ACM SIGGRAPH/Eurographics Symposium on Computer Animation (SCA '13). Association for Computing Machinery, New York, NY, USA, 135--144. https://doi.org/10.1145/2485895.2485911Google ScholarGoogle Scholar
  8. Mikhail Jacob, Sam Devlin, and Katja Hofmann. 2020. "It's Unwieldy and It Takes a Lot of Time." Challenges and Opportunities for Creating Agents in Commercial Games. In 16th AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment. Association for the Advancement of Artificial Intelligence (AAAI), Association for the Advancement of Artificial Intelligence (AAAI).Google ScholarGoogle Scholar
  9. Ruth Y. Litovsky, Steven H. Colburn, William A. Yost, and Sandra J. Guzman. 1999. The Precedence Effect. The Journal of the Acoustical Society of America 106, 4 (1999), 1633--1654. https://doi.org/10.1121/1.427914Google ScholarGoogle ScholarCross RefCross Ref
  10. Christian Lorenzi, Stuart Gatehouse, and Catherine Lever. 1999. Sound Localization in Noise in Normal-Hearing Listeners. The Journal of the Acoustical Society of America 105, 3 (March 1999), 1810--1820. https://doi.org/10.1121/1.426719Google ScholarGoogle Scholar
  11. Microsoft Corp. 2018. Project Acoustics. https://aka.ms/acoustics.Google ScholarGoogle Scholar
  12. A. W. Mills. 1958. On the Minimum Audible Angle. The Journal of the Acoustical Society of America 30, 4 (April 1958), 237--246. https://doi.org/10.1121/1.1909553Google ScholarGoogle ScholarCross RefCross Ref
  13. Jason Mitchell, Gary McTaggart, and Chris Green. 2006. Shading in Valve's Source Engine. In ACM SIGGRAPH 2006 Courses (SIGGRAPH '06). Association for Computing Machinery, New York, NY, USA, 129--142. https://doi.org/10.1145/1185657.1185832Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Andrew Owens, Jiajun Wu, Josh H. McDermott, William T. Freeman, and Antonio Torralba. 2016. Ambient Sound Provides Supervision for Visual Learning. In Computer Vision - ECCV 2016 (Lecture Notes in Computer Science), Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling (Eds.). Springer International Publishing, Cham, 801--816. https://doi.org/10.1007/978-3-319-46448-0_48Google ScholarGoogle Scholar
  15. T. Painter and A. Spanias. 2000. Perceptual Coding of Digital Audio. Proc. IEEE 88, 4 (April 2000), 451--515. https://doi.org/10.1109/5.842996Google ScholarGoogle ScholarCross RefCross Ref
  16. Ville Pulkki. 1997. Virtual Sound Source Positioning Using Vector Base Amplitude Panning. Journal of the Audio Engineering Society 45, 6 (June 1997), 456--466.Google ScholarGoogle Scholar
  17. Nikunj Raghuvanshi and John Snyder. 2018. Parametric Directional Coding for Precomputed Sound Propagation. ACM Trans. Graph. (2018).Google ScholarGoogle Scholar
  18. Matthew Rosen, Keith W. Godin, and Nikunj Raghuvanshi. 2020. Interactive Sound Propagation for Dynamic Scenes Using 2D Wave Simulation. Computer Graphics Forum 39, 8 (2020), 39--46. https://doi.org/10.1111/cgf.14099Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Lauri Savioja and U Peter Svensson. 2015. Overview of Geometrical Room Acoustic Modeling Techniques. The Journal of the Acoustical Society of America 138, 2 (2015), 708--730.Google ScholarGoogle ScholarCross RefCross Ref
  20. J. Robert Stuart. 1996. The Psychoacoustics of Multichannel Audio. In Audio Engineering Society Conference: UK 11th Conference: Audio for New Media (ANM). Audio Engineering Society.Google ScholarGoogle Scholar
  21. Yu Wang, Mubbasir Kapadia, Pengfei Huang, Ladislav Kavan, and Norman I. Badler. 2014. Sound Localization and Multi-Modal Steering for Autonomous Virtual Agents. In Proceedings of the 18th Meeting of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games (I3D '14). Association for Computing Machinery, New York, NY, USA, 23--30. https://doi.org/10.1145/2556700.2556718Google ScholarGoogle Scholar
  22. Zechen Zhang, Nikunj Raghuvanshi, John Snyder, and Steve Marschner. 2018. Ambient Sound Propagation. ACM Trans. Graph. 6 (Nov. 2018). https://doi.org/10.1145/3272127.3275100Google ScholarGoogle Scholar
  23. Zechen Zhang, Nikunj Raghuvanshi, John Snyder, and Steve Marschner. 2019. Acoustic Texture Rendering for Extended Sources in Complex Scenes. ACM Trans. Graph. 38, 6 (Nov. 2019). https://doi.org/10.1145/3355089.3356566Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Efficient acoustic perception for virtual AI agents

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Article Metrics

          • Downloads (Last 12 months)34
          • Downloads (Last 6 weeks)1

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!