skip to main content
research-article
Open Access

Seeing Sound: Investigating the Effects of Visualizations and Complexity on Crowdsourced Audio Annotations

Published:06 December 2017Publication History
Skip Abstract Section

Abstract

Audio annotation is key to developing machine-listening systems; yet, effective ways to accurately and rapidly obtain crowdsourced audio annotations is understudied. In this work, we seek to quantify the reliability/redundancy trade-off in crowdsourced soundscape annotation, investigate how visualizations affect accuracy and efficiency, and characterize how performance varies as a function of audio characteristics. Using a controlled experiment, we varied sound visualizations and the complexity of soundscapes presented to human annotators. Results show that more complex audio scenes result in lower annotator agreement, and spectrogram visualizations are superior in producing higher quality annotations at lower cost of time and human labor. We also found recall is more affected than precision by soundscape complexity, and mistakes can be often attributed to certain sound event characteristics. These findings have implications not only for how we should design annotation tasks and interfaces for audio data, but also how we train and evaluate machine-listening systems.

References

  1. Apple Inc. 2017. Apple GarageBand. (2017). http://www.apple.com/mac/garageband/.Google ScholarGoogle Scholar
  2. Avid Technology, Inc. 2017. Pro Tools. (2017). http://www.avid.com/pro-tools.Google ScholarGoogle Scholar
  3. Yusuf Aytar, Carl Vondrick, and Antonio Torralba. 2016. Soundnet: Learning sound representations from unlabeled video Proc. of Advances in Neural Information Processing Systems. 892--900. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. BBC. 2017. BBC Sound Effects Library. (2017). https://www.sound-ideas.com/Product/154/BBC-Sound-Effects-Library-Original-CDs-1--60Google ScholarGoogle Scholar
  5. Mark Cartwright and Bryan Pardo. 2013. Social-EQ: Crowdsourcing an Equalization Descriptor Map Proc. of the International Society for Music Information Retrieval Conference.Google ScholarGoogle Scholar
  6. Mark Cartwright and Bryan Pardo. 2015. VocalSketch: Vocally Imitating Audio Concepts. In Proc. of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 43--46. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Mark Cartwright, Bryan Pardo, Gautham Mysore, and Matthew Hoffman. 2016. Fast and Easy Crowdsourced Perceptual Audio Evaluation Proc. of the International Conference on Acoustics, Speech and Signal Processing.Google ScholarGoogle Scholar
  8. Joseph Chee Chang, Saleema Amershi, and Ece Kamar. 2017. Revolt: Collaborative Crowdsourcing for Labeling Machine Learning Datasets Proc. of the SIGCHI Conference on Human Factors in Computing Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Cornell Lab of Ornithology. 2017. Raven. (2017). http://www.birds.cornell.edu/brp/raven/RavenOverview.htmlGoogle ScholarGoogle Scholar
  10. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database Proc. of the IEEE Cnference on Computer Vision and Pattern Recognition. IEEE, 248--255.Google ScholarGoogle Scholar
  11. Jia Deng, Olga Russakovsky, Jonathan Krause, Michael S Bernstein, Alex Berg, and Li Fei-Fei. 2014. Scalable multi-label annotation. In Proc. of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 3099--3102. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Thomas Fillon, Joséphine Simonnot, Marie-France Mifune, Stéphanie Khoury, Guillaume Pellerin, and Maxime Le Coz. 2014. Telemeta: An open-source web framework for ethnomusicological audio archives management and automatic analysis. In Proc. of the International Workshop on Digital Libraries for Musicology. ACM, 1--8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Pasquale Foggia, Nicolai Petkov, Alessia Saggese, Nicola Strisciuglio, and Mario Vento. 2015. Reliable detection of audio events in highly noisy environments. Pattern Recognition Letters Vol. 65 (2015), 22--28. aphy Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Seeing Sound: Investigating the Effects of Visualizations and Complexity on Crowdsourced Audio Annotations

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader
            About Cookies On This Site

            We use cookies to ensure that we give you the best experience on our website.

            Learn more

            Got it!