Abstract
We model acoustic perception in AI agents efficiently within complex scenes with many sound events. The key idea is to employ perceptual parameters that capture how each sound event propagates through the scene to the agent's location. This naturally conforms virtual perception to human. We propose a simplified auditory masking model that limits localization capability in the presence of distracting sounds. We show that anisotropic reflections as well as the initial sound serve as useful localization cues. Our system is simple, fast, and modular and obtains natural results in our tests, letting agents navigate through passageways and portals by sound alone, and anticipate or track occluded but audible targets. Source code is provided.
Supplemental Material
Available for Download
Supplemental movie, appendix, image and software files for, Efficient acoustic perception for virtual AI agents
- Jens Blauert. 1996. Spatial Hearing: The Psychophysics of Human Sound Localization.Google Scholar
Cross Ref
- Chang'an Chen, Unnat Jain, Carl Schissler, S. V. A. Garí, Ziad Al-Halah, Vamsi K. Ithapu, Philip Robinson, and K. Grauman. 2020. SoundSpaces: Audio-Visual Navigation in 3D Environments. In ECCV. https://doi.org/10.1007/978-3-030-58539-6_2Google Scholar
- E. Colin Cherry. 1953. Some Experiments on the Recognition of Speech, with One and with Two Ears. The Journal of the Acoustical Society of America 25, 5 (Sept. 1953), 975--979. https://doi.org/10.1121/1.1907229Google Scholar
Cross Ref
- Brent Cowan, Bill Kapralos, and K. C. Collins. 2020. Realistic Auditory Artificial Intelligence: Spatial Sound Modelling to Provide NPCs with Sound Perception. In Audio Engineering Society Conference: 2020 AES International Conference on Audio for Virtual and Augmented Reality. Audio Engineering Society.Google Scholar
- Sam Devlin, Raluca Georgescu, Ida Momennejad, Jaroslaw Rzepecki, Evelyn Zuniga, Gavin Costello, Guy Leroy, Ali Shaw, and Katja Hofmann. 2021. Navigation Turing Test (NTT): Learning to Evaluate Human-like Navigation. In 2021 International Conference on Machine Learning.Google Scholar
- Anders Gade. 2007. Acoustics in Halls for Speech and Music. In Springer Handbook of Acoustics (two thousand, seventh ed.), Thomas Rossing (Ed.). Springer, Chapter 9.Google Scholar
- Pengfei Huang, Mubbasir Kapadia, and Norman I. Badler. 2013. SPREAD: Sound Propagation and Perception for Autonomous Agents in Dynamic Environments. In Proceedings of the 12th ACM SIGGRAPH/Eurographics Symposium on Computer Animation (SCA '13). Association for Computing Machinery, New York, NY, USA, 135--144. https://doi.org/10.1145/2485895.2485911Google Scholar
- Mikhail Jacob, Sam Devlin, and Katja Hofmann. 2020. "It's Unwieldy and It Takes a Lot of Time." Challenges and Opportunities for Creating Agents in Commercial Games. In 16th AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment. Association for the Advancement of Artificial Intelligence (AAAI), Association for the Advancement of Artificial Intelligence (AAAI).Google Scholar
- Ruth Y. Litovsky, Steven H. Colburn, William A. Yost, and Sandra J. Guzman. 1999. The Precedence Effect. The Journal of the Acoustical Society of America 106, 4 (1999), 1633--1654. https://doi.org/10.1121/1.427914Google Scholar
Cross Ref
- Christian Lorenzi, Stuart Gatehouse, and Catherine Lever. 1999. Sound Localization in Noise in Normal-Hearing Listeners. The Journal of the Acoustical Society of America 105, 3 (March 1999), 1810--1820. https://doi.org/10.1121/1.426719Google Scholar
- Microsoft Corp. 2018. Project Acoustics. https://aka.ms/acoustics.Google Scholar
- A. W. Mills. 1958. On the Minimum Audible Angle. The Journal of the Acoustical Society of America 30, 4 (April 1958), 237--246. https://doi.org/10.1121/1.1909553Google Scholar
Cross Ref
- Jason Mitchell, Gary McTaggart, and Chris Green. 2006. Shading in Valve's Source Engine. In ACM SIGGRAPH 2006 Courses (SIGGRAPH '06). Association for Computing Machinery, New York, NY, USA, 129--142. https://doi.org/10.1145/1185657.1185832Google Scholar
Digital Library
- Andrew Owens, Jiajun Wu, Josh H. McDermott, William T. Freeman, and Antonio Torralba. 2016. Ambient Sound Provides Supervision for Visual Learning. In Computer Vision - ECCV 2016 (Lecture Notes in Computer Science), Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling (Eds.). Springer International Publishing, Cham, 801--816. https://doi.org/10.1007/978-3-319-46448-0_48Google Scholar
- T. Painter and A. Spanias. 2000. Perceptual Coding of Digital Audio. Proc. IEEE 88, 4 (April 2000), 451--515. https://doi.org/10.1109/5.842996Google Scholar
Cross Ref
- Ville Pulkki. 1997. Virtual Sound Source Positioning Using Vector Base Amplitude Panning. Journal of the Audio Engineering Society 45, 6 (June 1997), 456--466.Google Scholar
- Nikunj Raghuvanshi and John Snyder. 2018. Parametric Directional Coding for Precomputed Sound Propagation. ACM Trans. Graph. (2018).Google Scholar
- Matthew Rosen, Keith W. Godin, and Nikunj Raghuvanshi. 2020. Interactive Sound Propagation for Dynamic Scenes Using 2D Wave Simulation. Computer Graphics Forum 39, 8 (2020), 39--46. https://doi.org/10.1111/cgf.14099Google Scholar
Digital Library
- Lauri Savioja and U Peter Svensson. 2015. Overview of Geometrical Room Acoustic Modeling Techniques. The Journal of the Acoustical Society of America 138, 2 (2015), 708--730.Google Scholar
Cross Ref
- J. Robert Stuart. 1996. The Psychoacoustics of Multichannel Audio. In Audio Engineering Society Conference: UK 11th Conference: Audio for New Media (ANM). Audio Engineering Society.Google Scholar
- Yu Wang, Mubbasir Kapadia, Pengfei Huang, Ladislav Kavan, and Norman I. Badler. 2014. Sound Localization and Multi-Modal Steering for Autonomous Virtual Agents. In Proceedings of the 18th Meeting of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games (I3D '14). Association for Computing Machinery, New York, NY, USA, 23--30. https://doi.org/10.1145/2556700.2556718Google Scholar
- Zechen Zhang, Nikunj Raghuvanshi, John Snyder, and Steve Marschner. 2018. Ambient Sound Propagation. ACM Trans. Graph. 6 (Nov. 2018). https://doi.org/10.1145/3272127.3275100Google Scholar
- Zechen Zhang, Nikunj Raghuvanshi, John Snyder, and Steve Marschner. 2019. Acoustic Texture Rendering for Extended Sources in Complex Scenes. ACM Trans. Graph. 38, 6 (Nov. 2019). https://doi.org/10.1145/3355089.3356566Google Scholar
Digital Library
Index Terms
Efficient acoustic perception for virtual AI agents
Recommendations
Sound localization and multi-modal steering for autonomous virtual agents
I3D '14: Proceedings of the 18th meeting of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and GamesWith the increasing realism of interactive applications, there is a growing need for harnessing additional sensory modalities such as hearing. While the synthesis and propagation of sounds in virtual environments has been explored, there has been little ...
Source and Listener Directivity for Interactive Wave-Based Sound Propagation
We present an approach to model dynamic, data-driven source and listener directivity for interactive wave-based sound propagation in virtual environments and computer games. Our directional source representation is expressed as a linear combination of ...
Categorization of seismic sources by auditory display
Recordings of the Earth's surface oscillation as a function of time (seismograms) can be sonified by compressing time so that most of the signal's frequency spectrum falls in the audible range. The pattern-recognition capabilities of the human auditory ...






Comments