10.1145/2700648.2809840acmconferencesArticle/Chapter ViewAbstractPublication PagesassetsConference Proceedings
research-article

Faster Text-to-Speeches: Enhancing Blind People's Information Scanning with Faster Concurrent Speech

ABSTRACT

Blind people rely mostly on the auditory feedback of screen readers to consume digital information. Still, how fast can information be processed remains a major problem. The use of faster speech rates is one of the main techniques to speed-up the consumption of digital information. Moreover, recent experiments have suggested the use of concurrent speech as a valid alternative when scanning for relevant information. In this paper, we present an experiment with 30 visually impaired participants, where we compare the use of faster speech rates against the use of concurrent speech. Moreover, we combine these two approaches by gradually increasing the speech rate with one, two and three voices. Results show that concurrent voices with speech rates slightly faster than the default rate, enable a significantly faster scanning for relevant content, while maintaining its comprehension. In contrast, to keep-up with concurrent speech timings, One-Voice requires larger speech rate increments, which cause a considerable loss in performance. Overall, results suggest that the best compromise between efficiency and the ability to understand each sentence is the use of Two-Voices with a rate of 1.75*default-rate (approximately 278 WPM).

References

  1. Ahmed, F., Borodin, Y., Puzis, Y., and Ramakrishnan, I. V. 2012. Why Read if You Can Skim: Towards Enabling Faster Screen Reading. In Proceedings of the International Cross-Disciplinary Conference on Web Accessibility (W4A). Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Arons, B. 1997 SpeechSkimmer: A System for Interactively Skimming Recorded Speech. ACM Transactions on Computer-Human Interaction (TOCHI) - Special issue on speech as data 4, 1, 3--38. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Asakawa, C., Takagi, H., Ino, S., and Ifukube, T. 2003 Maximum listening speeds for the blind. In Proceedings of the International Community for Auditory Display (ICAD), pp. 276--279.Google ScholarGoogle Scholar
  4. Bigham, J., Cavender, A., Brudvik, J., Wobbrock, J., and Ladner, R. 2007. WebinSitu: a comparative analysis of blind and sighted browsing behavior. In Proceedings of the 9th international ACM SIGACCESS conference on Computers and accessibility (ASSETS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Borodin, Y., and Bigham, J. 2010 More than meets the eye: a survey of screen-reader browsing strategies. In Proceedings of the International Cross-Disciplinary Conference on Web Accessibility (W4A) Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Brungart, D. S., and Simpson, B. D. 2005. Improving Multitalker Speech Communication with Advanced Audio Displays. Air Force Research Lab Wright-Patterson AFB OH.Google ScholarGoogle Scholar
  7. Burton, H. 2003. Visual cortex activity in early and late blind people. The Journal of neuroscience: the official journal of the Society for Neuroscience 23, 10, 4005--11.Google ScholarGoogle ScholarCross RefCross Ref
  8. Cherry, E. 1953. Some experiments on the recognition of speech, with one and with two ears. The Journal of the acoustical society of America.Google ScholarGoogle ScholarCross RefCross Ref
  9. Conway, M. 2001. Sensory-perceptual episodic memory and its context: autobiographical memory. Philosophical transactions of the Royal Society of London. Series B, Biological sciences 356, 1413, 1375--84.Google ScholarGoogle Scholar
  10. Darwin, C. J., Brungart, D. S., and Simpson, B. D. 2003 Effects of fundamental frequency and vocal-tract length changes on attention to one of two simultaneous talkers. The Journal of the Acoustical Society of America 114, 5, 2913.Google ScholarGoogle ScholarCross RefCross Ref
  11. Drullman, R., and Bronkhorst, A. 2000. Multichannel speech intelligibility and talker recognition using monaural, binaural, and three-dimensional auditory presentation. The Journal of the Acoustical Society of America 107, 4, 2224--2235.Google ScholarGoogle ScholarCross RefCross Ref
  12. Fostick, L., Ben-Artzi, E., and Babkoff, H. 2013 Aging and speech perception: Beyond hearing threshold and cognitive ability. Journal of basic and clinical physiology and pharmacology 24, 3, 175--183.Google ScholarGoogle Scholar
  13. Goble, C., Harper, S., and Stevens, R. 2000. The travails of visually impaired web travellers. Proceedings of the eleventh ACM on Hypertext and hypermedia - HYPERTEXT, 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Goose, S., and Moller, C. 1999. A 3D Audio Only Interactive Web Browser: Using Spatialization to Convey Hypermedia Document Structure. Proceedings of the seventh ACM international conference on Multimedia, pp. 363--371. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Guerreiro, J., and Gonçalves, D. 2014. Text-to-Speeches: Evaluating the Perception of Concurrent Speech by Blind People. In Proceedings of the 16th international ACM SIGACCESS conference on Computers & accessibility, ACM, pp. 169--176. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Guerreiro, J., Rodrigues, A., Montague, K., Guerreiro, T., Nicolau, H., and Gonçalves, D. 2015. TabLETS Get Physical: Non-Visual Text Entry on Tablet Devices. In Proceedings of the SIGCHI conference on Human factors in computing systems, ACM, 39--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Guerreiro, J., and Gonçalves, D. 2013 Blind People Interacting with Mobile Social Applications: Open Challenges. In Mobile Accessibility Workshop at CHI.Google ScholarGoogle Scholar
  18. Harper, S., and Patel, N. 2005 Gist Summaries for Visually Impaired Surfers. In Proceedings of the 7th international ACM SIGACCESS conference on Computers and accessibility, pp. 90--97. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. He, L., and Gupta, A. 2001. Exploring benefits of non-linear time compression. In Proceedings of the ninth ACM international conference on Multimedia, ACM, pp. 382--391. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Hugdahl, K., Ek, M., Takio, F., Rintee, T., Tuomainen, J., Haarala, C., and Hämäläinen, H. 2004. Blind individuals show enhanced perceptual and attentional sensitivity for identification of speech sounds. Brain research. Cognitive brain research 19, 1, 28--32.Google ScholarGoogle Scholar
  21. Moulines, E., and Charpentier, F. 1990. Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech communication 9, 5, 453--467. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Paulo, S., Oliveira, L. C., Mendes, C., Figueira, L., Cassaca, R., Viana, C., and Moniz, H. 2008. Dixi - a generic text-to-speech system for european portuguese. Computational Processing of the Portuguese Language, 91--100. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Sato, D., Zhu, S., Kobayashi, M., Takagi, H., and Asakawa, C. 2011. Sasayaki: Voice Augmented Web Browsing Experience. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 2769--2778. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Sauro, J., and Dumas, J. S. 2009. Comparison of three one-question, post-task usability questionnaires. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ACM, pp. 1599--1608. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Schmandt, C., and Mullins, A. 1995. AudioStreamer: exploiting simultaneity for listening. In Conference companion on Human factors in Computing Systems (CHI'95), ACM, 218--219. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Sodnik, J., and Tomažic, S. 2009. Spatial Speaker: 3D Java Text-to-Speech Converter. In Proceedings of the World Congress on Engineering and Computer Science, vol. II.Google ScholarGoogle Scholar
  27. Stent, A., Syrdal, A., and Mishra, T. 2011. On the Intelligibility of Fast Synthesized Speech for Individuals with Early-Onset Blindness. In Proceedings of the international ACM SIGACCESS conference on Computers & accessibility, ACM, 211--218. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Takagi, H., Saito, S., Fukuda, K., and Asakawa, C. 2007 Analysis of navigability of Web applications for improving blind usability. ACM Transactions on Computer-Human Interaction 14, 3, 13--es. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Trouvain, J. 2007. On the comprehension of extremely fast synthetic speech.Google ScholarGoogle Scholar
  30. Valentini-Botinhao, C., Toman, M., Pucher, M., Schabus, D., and Yamagishi, J. 2014. Intelligibility analysis of fast synthesized speech. In Fifteenth Annual Conference of the International Speech Communication Association.Google ScholarGoogle Scholar
  31. Vigo, M., and Harper, S. 2013. Coping tactics employed by visually disabled users on the web. International Journal of Human-Computer Studies 71, 11, 1013--1025. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Wechsler, D. 1981. WAIS-R manual: Wechsler adult intelligence scale-revised. Psychological Corporation.Google ScholarGoogle Scholar
  33. Wenzel, E. M., Arruda, M., Kistler, D. J., and Wightman, F. L. 1993. Localization using nonindividualized head-related transfer functions. The Journal of the Acoustical Society of America 94, 1, 111--123.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Faster Text-to-Speeches

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!