Abstract
In human-computer interaction, particularly in multimedia delivery, information is communicated to users sequentially, whereas users are capable of receiving information from multiple sources concurrently. This mismatch indicates that a sequential mode of communication does not utilise human perception capabilities as efficiently as possible. This article reports an experiment that investigated various speech-based (audio) concurrent designs and evaluated the comprehension depth of information by comparing comprehension performance across several different formats of questions (main/detailed, implied/stated). The results showed that users, besides answering the main questions, were also successful in answering the implied questions, as well as the questions that required detailed information, and that the pattern of comprehension depth remained similar to that seen to a baseline condition, where only one speech source was presented. However, the participants answered more questions correctly that were drawn from the main information, and performance remained low where the questions were drawn from detailed information. The results are encouraging to explore the concurrent methods further for communicating multiple information streams efficiently in human-computer interaction, including multimedia.
- Jennifer Aydelott, Dinah Baer-Henney, Maciej Trzaskowski, Robert Leech, and Frederic Dick. 2012. Sentence comprehension in competing speech: Dichotic sentence-word priming reveals hemispheric differences in auditory semantic processing. Lang. Cogn. Process. 27, 7--8 (2012), 1108--1144.Google Scholar
Cross Ref
- Jennifer Aydelott, Zahra Jamaluddin, and Stefanie Nixon Pearce. 2015. Semantic processing of unattended speech in dichotic listening. J. Acoust. Soc. Amer. 138, 2 (2015), 964--975.Google Scholar
Cross Ref
- David Beattie, Lynne Baillie, and Martin Halvey. 2015. A comparison of artificial driving sounds for automated vehicles. In Proceedings of the ACM International Joint Conference on Pervasive and Ubiquitous Computing. ACM, 451--462.Google Scholar
Digital Library
- David Beattie, Lynne Baillie, and Martin Halvey. 2017. Exploring how drivers perceive spatial earcons in automated vehicles. Proc. ACM Interact. Mobile Wear. Ubiq. Technol. 1, 3 (2017), 36.Google Scholar
- Virginia Best, Frederick J. Gallun, Antje Ihlefeld, and Barbara G. Shinn-Cunningham. 2006. The influence of spatial separation on divided listening a. J. Acoust. Soc. Amer. 120, 3 (2006), 1506--1516.Google Scholar
Cross Ref
- Konstantin Biatov and Joachim Koehler. 2003. An audio stream classification and optimal segmentation for multimedia applications. In Proceedings of the 11th ACM International Conference on Multimedia. ACM, 211--214.Google Scholar
Digital Library
- Luca Brayda, Federico Traverso, Luca Giuliani, Francesco Diotalevi, Stefania Repetto, Sara Sansalone, Andrea Trucco, and Giulio Sandini. 2015. Spatially selective binaural hearing aids. In Adjunct Proceedings of the ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2015 ACM International Symposium on Wearable Computers. ACM, 957--962.Google Scholar
Digital Library
- A. S. Bregman. 1990. Auditory Scene Analysis: The Perceptual Organization of Sound. 1990.Google Scholar
- Albert S. Bregman. 1994. Auditory Scene Analysis: The Perceptual Organization of Sound. MIT Press.Google Scholar
- Robert H. Brookshire and Linda E. Nicholas. 1997. Discourse Comprehension Test: Test KIT. Retrieved from http://www.picaprograms.com/discourse_comprehension_test.htm.Google Scholar
- Douglas S. Brungart and Brian D. Simpson. 2005. Optimizing the spatial configuration of a seven-talker speech display. ACM Trans. Appl. Percept. 2, 4 (2005), 430--436.Google Scholar
Digital Library
- George Chernyshov, Benjamin Tag, Jiajun Chen, Vontin Noriyasu, Paul Lukowicz, and Kai Kunze. 2016. Wearable ambient sound display: Embedding information in personal music. In Proceedings of the ACM International Symposium on Wearable Computers. ACM, 58--59.Google Scholar
Digital Library
- Karen Church, Mauro Cherubini, and Nuria Oliver. 2014. A large-scale study of daily information needs captured in situ. ACM Trans. Comput.-Hum. Interact. 21, 2, Article 10 (Feb. 2014), 46 pages. DOI:https://doi.org/10.1145/2552193Google Scholar
Digital Library
- Andrew R. A. Conway, Nelson Cowan, and Michael F. Bunting. 2001. The cocktail party phenomenon revisited: The importance of working memory capacity. Psychonom. Bull. Rev. 8, 2 (2001), 331--335.Google Scholar
Cross Ref
- Ádám Csapó and György Wersényi. 2013. Overview of auditory representations in human-machine interfaces. ACM Comput. Surveys 46, 2 (2013), 19.Google Scholar
Digital Library
- Alan Dix, Janet E. Finlay, Gregory D. Abowd, and Russell Beale. 2003. Human-Computer Interaction (3rd Edition). Prentice-Hall, Inc., USA.Google Scholar
Digital Library
- Jonathan Doherty, Kevin Curran, and Paul Mckevitt. 2013. A self-similarity approach to repairing large dropouts of streamed music. ACM Trans. Multimedia Comput. Commun. Appl. 9, 3 (2013), 20.Google Scholar
Digital Library
- Mounya Elhilali and Shihab A. Shamma. 2008. A cocktail party with a cortical twist: How cortical mechanisms contribute to sound segregation. J. Acoust. Soc. Amer. 124, 6 (2008), 3751--3771.Google Scholar
Cross Ref
- Muhammad Abu ul Fazal. 2019. Concurrent information communication in voice-based interaction. Ph.D. Dissertation. University of Technology Sydney.Google Scholar
- Muhammad Abu ul Fazal, Sam Ferguson, and Andrew Johnston. 2018. Investigating concurrent speech-based designs for information communication. In Proceedings of the Audio Mostly Conferenceon Sound in Immersion and Emotion (AM’18). ACM, New York, NY, Article 4, 8 pages. DOI:https://doi.org/10.1145/3243274.3243284Google Scholar
Digital Library
- Muhammad Abu ul Fazal, Sam Ferguson, Muhammad Shuaib Karim, and Andrew Johnston. 2018. Concurrent voice-based multiple information communication: A study report of profile-based users’ interaction. In Proceedings of the 145th Convention of the Audio Engineering Society. Audio Engineering Society.Google Scholar
- Muhammad Abu ul Fazal, Sam Ferguson, Shuaib Karim, and Andrew Johnston. 2019. Vinfomize: A framework for multiple voice-based information communication. In Proceedings of the 3rd International Conference on Information System and Data Mining (ICISDM'19). Association for Computing Machinery, New York, NY, 143--147. DOI:https://doi.org/10.1145/3325917.3325922Google Scholar
- Muhammad Abu ul Fazal and M. Shuaib Karim. 2017. Multiple information communication in voice-based interaction. In Advances in Intelligent Systems and Computing. Springer, 101--111. DOI:https://doi.org/10.1007/978-3-319-43982-2_9Google Scholar
- Wu-chi Feng. 2012. Streaming media evolution: Where to now? In Proceedings of the 22nd International Workshop on Network and Operating System Support for Digital Audio and Video. ACM, 57--58.Google Scholar
- Timothy D. Griffiths and Jason D. Warren. 2004. What is an auditory object? Nature Rev. Neurosci. 5, 11 (2004), 887--892.Google Scholar
Cross Ref
- João Guerreiro. 2013. Using simultaneous audio sources to speed-up blind people’s web scanning. In Proceedings of the 10th International Cross-Disciplinary Conference on Web Accessibility. ACM, 8.Google Scholar
Digital Library
- João Guerreiro. 2016. Toward screen readers with concurrent speech: Where to go next? ACM SIGACCESS Access. Comput. 115 (2016), 12--19.Google Scholar
Digital Library
- João Guerreiro and Daniel Gonçalves. 2014. Text-to-speeches: Evaluating the perception of concurrent speech by blind people. In Proceedings of the 16th International ACM SIGACCESS Conference on Computers and Accessibility. ACM, 169--176.Google Scholar
Digital Library
- João Guerreiro and Daniel Gonçalves. 2016. Scanning for digital content: How blind and sighted people perceive concurrent speech. ACM Trans. Access. Comput. 8, 1 (2016), 2.Google Scholar
Digital Library
- Stephen R. Gulliver and Gheorghita Ghinea. 2006. Defining user perception of distributed multimedia quality. ACM Trans. Multimedia Comput. Commun. Appl. 2, 4 (2006), 241--257.Google Scholar
Digital Library
- Alistair F. Hinde. 2016. Concurrency in auditory displays for connected television. Ph.D. Dissertation. University of York.Google Scholar
- Andrew Hines, Eoin Gillen, Damien Kelly, Jan Skoglund, Anil Kokaram, and Naomi Harte. 2014. Perceived audio quality for streaming stereo music. In Proceedings of the 22nd ACM International Conference on Multimedia. ACM, 1173--1176.Google Scholar
Digital Library
- Nandini Iyer, Eric R. Thompson, Brian D. Simpson, Douglas Brungart, and Van Summers. 2013. Exploring auditory gist: Comprehension of two dichotic, simultaneously presented stories. In Proceedings of the Meetings on Acoustics (ICA’13), Vol. 19. ASA, 050158.Google Scholar
Cross Ref
- Philip Kortum. 2008. HCI Beyond the GUI: Design for Haptic, Speech, Olfactory, and Other Nontraditional Interfaces. Morgan Kaufmann, San Francisco, CA.Google Scholar
- Everdina A. Lawson. 1966. Decisions concerning the rejected channel. Quart. J. Exper. Psychol. 18, 3 (1966), 260--265.Google Scholar
Cross Ref
- Guo-ping Li and Guo-yong Huang. 2005. The “Core-Periphery” pattern of the globalization of electronic commerce. In Proceedings of the 7th International Conference on Electronic Commerce (ICEC’05). ACM, New York, NY, 66--69. DOI:https://doi.org/10.1145/1089551.1089566Google Scholar
- Assunta Matassa, Luca Console, Leonardo Angelini, Maurizio Caon, and Omar Abou Khaled. 2015. Workshop on full-body and multisensory experience in ubiquitous interaction. In Proceedings of the ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the ACM International Symposium on Wearable Computers. ACM, 923--926.Google Scholar
- Josh H. McDermott. 2009. The cocktail party problem. Curr. Biol. 19, 22 (2009), R1024--R1027.Google Scholar
Cross Ref
- David K. McGookin and Stephen A. Brewster. 2004. Understanding concurrent earcons: Applying auditory scene analysis principles to concurrent earcon recognition. ACM Trans. Appl. Percept. 1, 2 (2004), 130--155.Google Scholar
Digital Library
- David Moffat and Joshua D. Reiss. 2018. Perceptual evaluation of synthesized sound effects. ACM Trans. Appl. Percept. 15, 2 (2018), 13.Google Scholar
Digital Library
- Neville Moray. 1959. Attention in dichotic listening: Affective cues and the influence of instructions. Quart. J. Exper. Psychol. 11, 1 (1959), 56--60.Google Scholar
Cross Ref
- Cowan Nelson. 1995. Attention and memory: An integrated framework. Oxford Psychol. Ser. 26 (1995).Google Scholar
- Jessica A. Obermeyer and Lisa A. Edmonds. 2018. Attentive reading with constrained summarization adapted to address written discourse in people with mild aphasia. Amer. J. Speech-lang. Pathol. 27, 1S (2018), 392--405.Google Scholar
Cross Ref
- Devangini Patel, Debjyoti Ghosh, and Shengdong Zhao. 2018. Teach me fast: How to optimize online lecture video speeding for learning in less time? In Proceedings of the 6th International Symposium of Chinese CHI. ACM, 160--163.Google Scholar
Digital Library
- Bashar Qudah and Nabil J. Sarhan. 2010. Efficient delivery of on-demand video streams to heterogeneous receivers. ACM Trans. Multimedia Comput. Commun. Appl. 6, 3 (2010), 20.Google Scholar
Digital Library
- Marie Rivenez, Christopher J. Darwin, and Anne Guillaume. 2006. Processing unattended speech. J. Acoust. Soc. Amer. 119, 6 (2006), 4027--4040.Google Scholar
Cross Ref
- Daisuke Sato, Shaojian Zhu, Masatomo Kobayashi, Hironobu Takagi, and Chieko Asakawa. 2011. Sasayaki: Augmented voice web browsing experience. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’11). ACM, New York, NY, 2769--2778. DOI:https://doi.org/10.1145/1978942.1979353Google Scholar
Digital Library
- Chris Schmandt and Atty Mullins. 1995. AudioStreamer: Exploiting simultaneity for listening. In Proceedings of the Conference Companion on Human Factors in Computing Systems. ACM, 218--219.Google Scholar
Digital Library
- Yolanda Vazquez Alvarez and Stephen A. Brewster. 2010. Designing spatial audio interfaces to support multiple audio streams. In Proceedings of the 12th International Conference on Human Computer Interaction with Mobile Devices and Services. ACM, 253--256.Google Scholar
- Richard J. Welland, Rosemary Lubinski, and D. Jeffery Higginbotham. 2002. Discourse comprehension test performance of elders with dementia of the Alzheimer type. J. Speech, Lang. Hear. Res. 45, 6 (2002), 1175--1187.Google Scholar
Cross Ref
- S. M. Williams. 1994. Perceptual Principles in Sound Grouping. In Auditory Display: Sonification, Audification, and Auditory Interfaces. Addison-Wesley. 95--125.Google Scholar
- Taotao Wu, Wanchun Dou, Fan Wu, Shaojie Tang, Chunhua Hu, and Jinjun Chen. 2016. A deployment optimization scheme over multimedia big data for large-scale media streaming application. ACM Trans. Multimedia Comput. Commun. Appl. 12, 5s (2016), 73.Google Scholar
Digital Library
- Changsheng Xu, Namunu C. Maddage, Xi Shao, and Qi Tian. 2007. Content-adaptive digital music watermarking based on music structure analysis. ACM Trans. Multimedia Comput. Commun. Appl. 3, 1 (2007), 1.Google Scholar
Digital Library
- Roger Zimmermann and Ke Liang. 2008. Spatialized audio streaming for networked virtual environments. In Proceedings of the 16th ACM International Conference on Multimedia. ACM, 299--308.Google Scholar
Digital Library
Index Terms
Evaluation of Information Comprehension in Concurrent Speech-based Designs
Recommendations
Investigating cognitive workload in concurrent speech-based information communication
Highlights- Sequential information communication may be underutilising human perception capabilities and, therefore, restricting users to seek information to sub-optimal ...
AbstractUsers are capable of noticing, listening, and comprehending concurrent information simultaneously, but in conventional speech-based interaction methods, systems communicate information sequentially to the users. This mismatch implies ...
The effect of clear speech to foreign-sounding interlocutors on native listeners’ perception of intelligibility
Highlights- Naturally elicited speech to foreign-sounding listeners is rated clearer than naturally elicited speech to native-sounding interlocutors.
AbstractHyperarticulation is an acoustic modification of the speech stream that has been reliably shown to be naturally part of clear speech. Despite the large number of studies that have investigated the relationship between clear speech ...






Comments