ABSTRACT
Blind people rely mostly on the auditory feedback of screen readers to consume digital information. Still, how fast can information be processed remains a major problem. The use of faster speech rates is one of the main techniques to speed-up the consumption of digital information. Moreover, recent experiments have suggested the use of concurrent speech as a valid alternative when scanning for relevant information. In this paper, we present an experiment with 30 visually impaired participants, where we compare the use of faster speech rates against the use of concurrent speech. Moreover, we combine these two approaches by gradually increasing the speech rate with one, two and three voices. Results show that concurrent voices with speech rates slightly faster than the default rate, enable a significantly faster scanning for relevant content, while maintaining its comprehension. In contrast, to keep-up with concurrent speech timings, One-Voice requires larger speech rate increments, which cause a considerable loss in performance. Overall, results suggest that the best compromise between efficiency and the ability to understand each sentence is the use of Two-Voices with a rate of 1.75*default-rate (approximately 278 WPM).
References
- Ahmed, F., Borodin, Y., Puzis, Y., and Ramakrishnan, I. V. 2012. Why Read if You Can Skim: Towards Enabling Faster Screen Reading. In Proceedings of the International Cross-Disciplinary Conference on Web Accessibility (W4A). Google Scholar
Digital Library
- Arons, B. 1997 SpeechSkimmer: A System for Interactively Skimming Recorded Speech. ACM Transactions on Computer-Human Interaction (TOCHI) - Special issue on speech as data 4, 1, 3--38. Google Scholar
Digital Library
- Asakawa, C., Takagi, H., Ino, S., and Ifukube, T. 2003 Maximum listening speeds for the blind. In Proceedings of the International Community for Auditory Display (ICAD), pp. 276--279.Google Scholar
- Bigham, J., Cavender, A., Brudvik, J., Wobbrock, J., and Ladner, R. 2007. WebinSitu: a comparative analysis of blind and sighted browsing behavior. In Proceedings of the 9th international ACM SIGACCESS conference on Computers and accessibility (ASSETS). Google Scholar
Digital Library
- Borodin, Y., and Bigham, J. 2010 More than meets the eye: a survey of screen-reader browsing strategies. In Proceedings of the International Cross-Disciplinary Conference on Web Accessibility (W4A) Google Scholar
Digital Library
- Brungart, D. S., and Simpson, B. D. 2005. Improving Multitalker Speech Communication with Advanced Audio Displays. Air Force Research Lab Wright-Patterson AFB OH.Google Scholar
- Burton, H. 2003. Visual cortex activity in early and late blind people. The Journal of neuroscience: the official journal of the Society for Neuroscience 23, 10, 4005--11.Google Scholar
Cross Ref
- Cherry, E. 1953. Some experiments on the recognition of speech, with one and with two ears. The Journal of the acoustical society of America.Google Scholar
Cross Ref
- Conway, M. 2001. Sensory-perceptual episodic memory and its context: autobiographical memory. Philosophical transactions of the Royal Society of London. Series B, Biological sciences 356, 1413, 1375--84.Google Scholar
- Darwin, C. J., Brungart, D. S., and Simpson, B. D. 2003 Effects of fundamental frequency and vocal-tract length changes on attention to one of two simultaneous talkers. The Journal of the Acoustical Society of America 114, 5, 2913.Google Scholar
Cross Ref
- Drullman, R., and Bronkhorst, A. 2000. Multichannel speech intelligibility and talker recognition using monaural, binaural, and three-dimensional auditory presentation. The Journal of the Acoustical Society of America 107, 4, 2224--2235.Google Scholar
Cross Ref
- Fostick, L., Ben-Artzi, E., and Babkoff, H. 2013 Aging and speech perception: Beyond hearing threshold and cognitive ability. Journal of basic and clinical physiology and pharmacology 24, 3, 175--183.Google Scholar
- Goble, C., Harper, S., and Stevens, R. 2000. The travails of visually impaired web travellers. Proceedings of the eleventh ACM on Hypertext and hypermedia - HYPERTEXT, 1--10. Google Scholar
Digital Library
- Goose, S., and Moller, C. 1999. A 3D Audio Only Interactive Web Browser: Using Spatialization to Convey Hypermedia Document Structure. Proceedings of the seventh ACM international conference on Multimedia, pp. 363--371. Google Scholar
Digital Library
- Guerreiro, J., and Gonçalves, D. 2014. Text-to-Speeches: Evaluating the Perception of Concurrent Speech by Blind People. In Proceedings of the 16th international ACM SIGACCESS conference on Computers & accessibility, ACM, pp. 169--176. Google Scholar
Digital Library
- Guerreiro, J., Rodrigues, A., Montague, K., Guerreiro, T., Nicolau, H., and Gonçalves, D. 2015. TabLETS Get Physical: Non-Visual Text Entry on Tablet Devices. In Proceedings of the SIGCHI conference on Human factors in computing systems, ACM, 39--42. Google Scholar
Digital Library
- Guerreiro, J., and Gonçalves, D. 2013 Blind People Interacting with Mobile Social Applications: Open Challenges. In Mobile Accessibility Workshop at CHI.Google Scholar
- Harper, S., and Patel, N. 2005 Gist Summaries for Visually Impaired Surfers. In Proceedings of the 7th international ACM SIGACCESS conference on Computers and accessibility, pp. 90--97. Google Scholar
Digital Library
- He, L., and Gupta, A. 2001. Exploring benefits of non-linear time compression. In Proceedings of the ninth ACM international conference on Multimedia, ACM, pp. 382--391. Google Scholar
Digital Library
- Hugdahl, K., Ek, M., Takio, F., Rintee, T., Tuomainen, J., Haarala, C., and Hämäläinen, H. 2004. Blind individuals show enhanced perceptual and attentional sensitivity for identification of speech sounds. Brain research. Cognitive brain research 19, 1, 28--32.Google Scholar
- Moulines, E., and Charpentier, F. 1990. Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech communication 9, 5, 453--467. Google Scholar
Digital Library
- Paulo, S., Oliveira, L. C., Mendes, C., Figueira, L., Cassaca, R., Viana, C., and Moniz, H. 2008. Dixi - a generic text-to-speech system for european portuguese. Computational Processing of the Portuguese Language, 91--100. Google Scholar
Digital Library
- Sato, D., Zhu, S., Kobayashi, M., Takagi, H., and Asakawa, C. 2011. Sasayaki: Voice Augmented Web Browsing Experience. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 2769--2778. Google Scholar
Digital Library
- Sauro, J., and Dumas, J. S. 2009. Comparison of three one-question, post-task usability questionnaires. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ACM, pp. 1599--1608. Google Scholar
Digital Library
- Schmandt, C., and Mullins, A. 1995. AudioStreamer: exploiting simultaneity for listening. In Conference companion on Human factors in Computing Systems (CHI'95), ACM, 218--219. Google Scholar
Digital Library
- Sodnik, J., and Tomažic, S. 2009. Spatial Speaker: 3D Java Text-to-Speech Converter. In Proceedings of the World Congress on Engineering and Computer Science, vol. II.Google Scholar
- Stent, A., Syrdal, A., and Mishra, T. 2011. On the Intelligibility of Fast Synthesized Speech for Individuals with Early-Onset Blindness. In Proceedings of the international ACM SIGACCESS conference on Computers & accessibility, ACM, 211--218. Google Scholar
Digital Library
- Takagi, H., Saito, S., Fukuda, K., and Asakawa, C. 2007 Analysis of navigability of Web applications for improving blind usability. ACM Transactions on Computer-Human Interaction 14, 3, 13--es. Google Scholar
Digital Library
- Trouvain, J. 2007. On the comprehension of extremely fast synthetic speech.Google Scholar
- Valentini-Botinhao, C., Toman, M., Pucher, M., Schabus, D., and Yamagishi, J. 2014. Intelligibility analysis of fast synthesized speech. In Fifteenth Annual Conference of the International Speech Communication Association.Google Scholar
- Vigo, M., and Harper, S. 2013. Coping tactics employed by visually disabled users on the web. International Journal of Human-Computer Studies 71, 11, 1013--1025. Google Scholar
Digital Library
- Wechsler, D. 1981. WAIS-R manual: Wechsler adult intelligence scale-revised. Psychological Corporation.Google Scholar
- Wenzel, E. M., Arruda, M., Kistler, D. J., and Wightman, F. L. 1993. Localization using nonindividualized head-related transfer functions. The Journal of the Acoustical Society of America 94, 1, 111--123.Google Scholar
Cross Ref
Index Terms
Faster Text-to-Speeches




Comments