skip to main content
research-article

Evaluation of Information Comprehension in Concurrent Speech-based Designs

Authors Info & Claims
Published:17 December 2020Publication History
Skip Abstract Section

Abstract

In human-computer interaction, particularly in multimedia delivery, information is communicated to users sequentially, whereas users are capable of receiving information from multiple sources concurrently. This mismatch indicates that a sequential mode of communication does not utilise human perception capabilities as efficiently as possible. This article reports an experiment that investigated various speech-based (audio) concurrent designs and evaluated the comprehension depth of information by comparing comprehension performance across several different formats of questions (main/detailed, implied/stated). The results showed that users, besides answering the main questions, were also successful in answering the implied questions, as well as the questions that required detailed information, and that the pattern of comprehension depth remained similar to that seen to a baseline condition, where only one speech source was presented. However, the participants answered more questions correctly that were drawn from the main information, and performance remained low where the questions were drawn from detailed information. The results are encouraging to explore the concurrent methods further for communicating multiple information streams efficiently in human-computer interaction, including multimedia.

References

  1. Jennifer Aydelott, Dinah Baer-Henney, Maciej Trzaskowski, Robert Leech, and Frederic Dick. 2012. Sentence comprehension in competing speech: Dichotic sentence-word priming reveals hemispheric differences in auditory semantic processing. Lang. Cogn. Process. 27, 7--8 (2012), 1108--1144.Google ScholarGoogle ScholarCross RefCross Ref
  2. Jennifer Aydelott, Zahra Jamaluddin, and Stefanie Nixon Pearce. 2015. Semantic processing of unattended speech in dichotic listening. J. Acoust. Soc. Amer. 138, 2 (2015), 964--975.Google ScholarGoogle ScholarCross RefCross Ref
  3. David Beattie, Lynne Baillie, and Martin Halvey. 2015. A comparison of artificial driving sounds for automated vehicles. In Proceedings of the ACM International Joint Conference on Pervasive and Ubiquitous Computing. ACM, 451--462.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. David Beattie, Lynne Baillie, and Martin Halvey. 2017. Exploring how drivers perceive spatial earcons in automated vehicles. Proc. ACM Interact. Mobile Wear. Ubiq. Technol. 1, 3 (2017), 36.Google ScholarGoogle Scholar
  5. Virginia Best, Frederick J. Gallun, Antje Ihlefeld, and Barbara G. Shinn-Cunningham. 2006. The influence of spatial separation on divided listening a. J. Acoust. Soc. Amer. 120, 3 (2006), 1506--1516.Google ScholarGoogle ScholarCross RefCross Ref
  6. Konstantin Biatov and Joachim Koehler. 2003. An audio stream classification and optimal segmentation for multimedia applications. In Proceedings of the 11th ACM International Conference on Multimedia. ACM, 211--214.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Luca Brayda, Federico Traverso, Luca Giuliani, Francesco Diotalevi, Stefania Repetto, Sara Sansalone, Andrea Trucco, and Giulio Sandini. 2015. Spatially selective binaural hearing aids. In Adjunct Proceedings of the ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2015 ACM International Symposium on Wearable Computers. ACM, 957--962.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. S. Bregman. 1990. Auditory Scene Analysis: The Perceptual Organization of Sound. 1990.Google ScholarGoogle Scholar
  9. Albert S. Bregman. 1994. Auditory Scene Analysis: The Perceptual Organization of Sound. MIT Press.Google ScholarGoogle Scholar
  10. Robert H. Brookshire and Linda E. Nicholas. 1997. Discourse Comprehension Test: Test KIT. Retrieved from http://www.picaprograms.com/discourse_comprehension_test.htm.Google ScholarGoogle Scholar
  11. Douglas S. Brungart and Brian D. Simpson. 2005. Optimizing the spatial configuration of a seven-talker speech display. ACM Trans. Appl. Percept. 2, 4 (2005), 430--436.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. George Chernyshov, Benjamin Tag, Jiajun Chen, Vontin Noriyasu, Paul Lukowicz, and Kai Kunze. 2016. Wearable ambient sound display: Embedding information in personal music. In Proceedings of the ACM International Symposium on Wearable Computers. ACM, 58--59.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Karen Church, Mauro Cherubini, and Nuria Oliver. 2014. A large-scale study of daily information needs captured in situ. ACM Trans. Comput.-Hum. Interact. 21, 2, Article 10 (Feb. 2014), 46 pages. DOI:https://doi.org/10.1145/2552193Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Andrew R. A. Conway, Nelson Cowan, and Michael F. Bunting. 2001. The cocktail party phenomenon revisited: The importance of working memory capacity. Psychonom. Bull. Rev. 8, 2 (2001), 331--335.Google ScholarGoogle ScholarCross RefCross Ref
  15. Ádám Csapó and György Wersényi. 2013. Overview of auditory representations in human-machine interfaces. ACM Comput. Surveys 46, 2 (2013), 19.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Alan Dix, Janet E. Finlay, Gregory D. Abowd, and Russell Beale. 2003. Human-Computer Interaction (3rd Edition). Prentice-Hall, Inc., USA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Jonathan Doherty, Kevin Curran, and Paul Mckevitt. 2013. A self-similarity approach to repairing large dropouts of streamed music. ACM Trans. Multimedia Comput. Commun. Appl. 9, 3 (2013), 20.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Mounya Elhilali and Shihab A. Shamma. 2008. A cocktail party with a cortical twist: How cortical mechanisms contribute to sound segregation. J. Acoust. Soc. Amer. 124, 6 (2008), 3751--3771.Google ScholarGoogle ScholarCross RefCross Ref
  19. Muhammad Abu ul Fazal. 2019. Concurrent information communication in voice-based interaction. Ph.D. Dissertation. University of Technology Sydney.Google ScholarGoogle Scholar
  20. Muhammad Abu ul Fazal, Sam Ferguson, and Andrew Johnston. 2018. Investigating concurrent speech-based designs for information communication. In Proceedings of the Audio Mostly Conferenceon Sound in Immersion and Emotion (AM’18). ACM, New York, NY, Article 4, 8 pages. DOI:https://doi.org/10.1145/3243274.3243284Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Muhammad Abu ul Fazal, Sam Ferguson, Muhammad Shuaib Karim, and Andrew Johnston. 2018. Concurrent voice-based multiple information communication: A study report of profile-based users’ interaction. In Proceedings of the 145th Convention of the Audio Engineering Society. Audio Engineering Society.Google ScholarGoogle Scholar
  22. Muhammad Abu ul Fazal, Sam Ferguson, Shuaib Karim, and Andrew Johnston. 2019. Vinfomize: A framework for multiple voice-based information communication. In Proceedings of the 3rd International Conference on Information System and Data Mining (ICISDM'19). Association for Computing Machinery, New York, NY, 143--147. DOI:https://doi.org/10.1145/3325917.3325922Google ScholarGoogle Scholar
  23. Muhammad Abu ul Fazal and M. Shuaib Karim. 2017. Multiple information communication in voice-based interaction. In Advances in Intelligent Systems and Computing. Springer, 101--111. DOI:https://doi.org/10.1007/978-3-319-43982-2_9Google ScholarGoogle Scholar
  24. Wu-chi Feng. 2012. Streaming media evolution: Where to now? In Proceedings of the 22nd International Workshop on Network and Operating System Support for Digital Audio and Video. ACM, 57--58.Google ScholarGoogle Scholar
  25. Timothy D. Griffiths and Jason D. Warren. 2004. What is an auditory object? Nature Rev. Neurosci. 5, 11 (2004), 887--892.Google ScholarGoogle ScholarCross RefCross Ref
  26. João Guerreiro. 2013. Using simultaneous audio sources to speed-up blind people’s web scanning. In Proceedings of the 10th International Cross-Disciplinary Conference on Web Accessibility. ACM, 8.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. João Guerreiro. 2016. Toward screen readers with concurrent speech: Where to go next? ACM SIGACCESS Access. Comput. 115 (2016), 12--19.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. João Guerreiro and Daniel Gonçalves. 2014. Text-to-speeches: Evaluating the perception of concurrent speech by blind people. In Proceedings of the 16th International ACM SIGACCESS Conference on Computers and Accessibility. ACM, 169--176.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. João Guerreiro and Daniel Gonçalves. 2016. Scanning for digital content: How blind and sighted people perceive concurrent speech. ACM Trans. Access. Comput. 8, 1 (2016), 2.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Stephen R. Gulliver and Gheorghita Ghinea. 2006. Defining user perception of distributed multimedia quality. ACM Trans. Multimedia Comput. Commun. Appl. 2, 4 (2006), 241--257.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Alistair F. Hinde. 2016. Concurrency in auditory displays for connected television. Ph.D. Dissertation. University of York.Google ScholarGoogle Scholar
  32. Andrew Hines, Eoin Gillen, Damien Kelly, Jan Skoglund, Anil Kokaram, and Naomi Harte. 2014. Perceived audio quality for streaming stereo music. In Proceedings of the 22nd ACM International Conference on Multimedia. ACM, 1173--1176.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Nandini Iyer, Eric R. Thompson, Brian D. Simpson, Douglas Brungart, and Van Summers. 2013. Exploring auditory gist: Comprehension of two dichotic, simultaneously presented stories. In Proceedings of the Meetings on Acoustics (ICA’13), Vol. 19. ASA, 050158.Google ScholarGoogle ScholarCross RefCross Ref
  34. Philip Kortum. 2008. HCI Beyond the GUI: Design for Haptic, Speech, Olfactory, and Other Nontraditional Interfaces. Morgan Kaufmann, San Francisco, CA.Google ScholarGoogle Scholar
  35. Everdina A. Lawson. 1966. Decisions concerning the rejected channel. Quart. J. Exper. Psychol. 18, 3 (1966), 260--265.Google ScholarGoogle ScholarCross RefCross Ref
  36. Guo-ping Li and Guo-yong Huang. 2005. The “Core-Periphery” pattern of the globalization of electronic commerce. In Proceedings of the 7th International Conference on Electronic Commerce (ICEC’05). ACM, New York, NY, 66--69. DOI:https://doi.org/10.1145/1089551.1089566Google ScholarGoogle Scholar
  37. Assunta Matassa, Luca Console, Leonardo Angelini, Maurizio Caon, and Omar Abou Khaled. 2015. Workshop on full-body and multisensory experience in ubiquitous interaction. In Proceedings of the ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the ACM International Symposium on Wearable Computers. ACM, 923--926.Google ScholarGoogle Scholar
  38. Josh H. McDermott. 2009. The cocktail party problem. Curr. Biol. 19, 22 (2009), R1024--R1027.Google ScholarGoogle ScholarCross RefCross Ref
  39. David K. McGookin and Stephen A. Brewster. 2004. Understanding concurrent earcons: Applying auditory scene analysis principles to concurrent earcon recognition. ACM Trans. Appl. Percept. 1, 2 (2004), 130--155.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. David Moffat and Joshua D. Reiss. 2018. Perceptual evaluation of synthesized sound effects. ACM Trans. Appl. Percept. 15, 2 (2018), 13.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Neville Moray. 1959. Attention in dichotic listening: Affective cues and the influence of instructions. Quart. J. Exper. Psychol. 11, 1 (1959), 56--60.Google ScholarGoogle ScholarCross RefCross Ref
  42. Cowan Nelson. 1995. Attention and memory: An integrated framework. Oxford Psychol. Ser. 26 (1995).Google ScholarGoogle Scholar
  43. Jessica A. Obermeyer and Lisa A. Edmonds. 2018. Attentive reading with constrained summarization adapted to address written discourse in people with mild aphasia. Amer. J. Speech-lang. Pathol. 27, 1S (2018), 392--405.Google ScholarGoogle ScholarCross RefCross Ref
  44. Devangini Patel, Debjyoti Ghosh, and Shengdong Zhao. 2018. Teach me fast: How to optimize online lecture video speeding for learning in less time? In Proceedings of the 6th International Symposium of Chinese CHI. ACM, 160--163.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Bashar Qudah and Nabil J. Sarhan. 2010. Efficient delivery of on-demand video streams to heterogeneous receivers. ACM Trans. Multimedia Comput. Commun. Appl. 6, 3 (2010), 20.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Marie Rivenez, Christopher J. Darwin, and Anne Guillaume. 2006. Processing unattended speech. J. Acoust. Soc. Amer. 119, 6 (2006), 4027--4040.Google ScholarGoogle ScholarCross RefCross Ref
  47. Daisuke Sato, Shaojian Zhu, Masatomo Kobayashi, Hironobu Takagi, and Chieko Asakawa. 2011. Sasayaki: Augmented voice web browsing experience. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’11). ACM, New York, NY, 2769--2778. DOI:https://doi.org/10.1145/1978942.1979353Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Chris Schmandt and Atty Mullins. 1995. AudioStreamer: Exploiting simultaneity for listening. In Proceedings of the Conference Companion on Human Factors in Computing Systems. ACM, 218--219.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Yolanda Vazquez Alvarez and Stephen A. Brewster. 2010. Designing spatial audio interfaces to support multiple audio streams. In Proceedings of the 12th International Conference on Human Computer Interaction with Mobile Devices and Services. ACM, 253--256.Google ScholarGoogle Scholar
  50. Richard J. Welland, Rosemary Lubinski, and D. Jeffery Higginbotham. 2002. Discourse comprehension test performance of elders with dementia of the Alzheimer type. J. Speech, Lang. Hear. Res. 45, 6 (2002), 1175--1187.Google ScholarGoogle ScholarCross RefCross Ref
  51. S. M. Williams. 1994. Perceptual Principles in Sound Grouping. In Auditory Display: Sonification, Audification, and Auditory Interfaces. Addison-Wesley. 95--125.Google ScholarGoogle Scholar
  52. Taotao Wu, Wanchun Dou, Fan Wu, Shaojie Tang, Chunhua Hu, and Jinjun Chen. 2016. A deployment optimization scheme over multimedia big data for large-scale media streaming application. ACM Trans. Multimedia Comput. Commun. Appl. 12, 5s (2016), 73.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Changsheng Xu, Namunu C. Maddage, Xi Shao, and Qi Tian. 2007. Content-adaptive digital music watermarking based on music structure analysis. ACM Trans. Multimedia Comput. Commun. Appl. 3, 1 (2007), 1.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Roger Zimmermann and Ke Liang. 2008. Spatialized audio streaming for networked virtual environments. In Proceedings of the 16th ACM International Conference on Multimedia. ACM, 299--308.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Evaluation of Information Comprehension in Concurrent Speech-based Designs

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!