10.1145/3234695.3236343acmconferencesArticle/Chapter ViewAbstractPublication PagesassetsConference Proceedingsconference-collections
research-article

Towards More Robust Speech Interactions for Deaf and Hard of Hearing Users

Online:08 October 2018Publication History

ABSTRACT

Mobile, wearable, and other ubiquitous computing devices are increasingly creating a context in which conventional keyboard and screen-based inputs are being replaced in favor of more natural speech-based interactions. Digital personal assistants use speech to control a wide range of functionality, from environmental controls to information access. However, many deaf and hard-of-hearing users have speech patterns that vary from those of hearing users due to incomplete acoustic feedback from their own voices. Because automatic speech recognition (ASR) systems are largely trained using speech from hearing individuals, speech-controlled technologies are typically inaccessible to deaf users. Prior work has focused on providing deaf users access to aural output via real-time captioning or signing, but little has been done to improve users' ability to provide input to these systems' speech-based interfaces. Further, the vocalization patterns of deaf speech often make accurate recognition intractable for both automated systems and human listeners, making traditional approaches to mitigate ASR limitations, such as human captionists, less effective. To bridge this accessibility gap, we investigate the limitations of common speech recognition approaches and techniques---both automatic and human-powered---when applied to deaf speech. We then explore the effectiveness of an iterative crowdsourcing workflow, and characterize the potential for groups to collectively exceed the performance of individuals. This paper contributes a better understanding of the challenges of deaf speech recognition and provides insights for future system development in this space.

References

  1. Michael S. Bernstein, Joel Brandt, Robert C. Miller, and David R. Karger. 2011. Crowds in Two Seconds: Enabling Realtime Crowd-powered Interfaces. In Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology (UIST '11). ACM, New York, NY, USA, 33--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Jeffrey P. Bigham, Chandrika Jayant, Hanjie Ji, Greg Little, Andrew Miller, Robert C. Miller, Robin Miller, Aubrey Tatarowicz, Brandyn White, Samual White, and Tom Yeh. 2010. VizWiz: Nearly Real-time Answers to Visual Questions. In Proceedings of the 23Nd Annual ACM Symposium on User Interface Software and Technology (UIST '10). ACM, New York, NY, USA, 333--342. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Jeffrey P. Bigham, Raja Kushalnagar, Ting-Hao Kenneth Huang, Juan Pablo Flores, and Saiph Savage. 2017. On How Deaf People Might Use Speech to Control Devices. In Proceedings of the 19th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '17). ACM, New York, NY, USA, 383--384. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Jeffrey P. Bigham, Richard E. Ladner, and Yevgen Borodin. 2011. The Design of Human-powered Access Technology. In The Proceedings of the 13th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '11). ACM, New York, NY, USA, 3--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Chris Callison-Burch. 2009. Fast, Cheap, and Creative: Evaluating Translation Quality Using Amazon's Mechanical Turk. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 (EMNLP '09). Association for Computational Linguistics, Stroudsburg, PA, USA, 286--295. http://dl.acm.org/citation.cfm?id=1699510.1699548 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Chris Callison-Burch and Mark Dredze. 2010. Creating Speech and Language Data with Amazon's Mechanical Turk. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk (CSLDAMT '10). Association for Computational Linguistics, Stroudsburg, PA, USA, 1--12. http://dl.acm.org/citation.cfm?id=1866696.1866697 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. CNET. 2017. The complete list of Alexa commands so far. (2017). https://www.cnet.com/how-to/amazon-echo-the-complete-list-of-alexa-commands/Google ScholarGoogle Scholar
  8. Stephen Cox, Michael Lincoln, Judy Tryggvason, Melanie Nakisa, Mark Wells, Marcus Tutt, and Sanja Abbott. 2002. Tessa, a System to Aid Communication with Deaf People. In Proceedings of the Fifth International ACM Conference on Assistive Technologies (Assets '02). ACM, New York, NY, USA, 205--212. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Julie S. Downs, Mandy B. Holbrook, Steve Sheng, and Lorrie Faith Cranor. 2010. Are Your Participants Gaming the System?: Screening Mechanical Turk Workers. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '10). ACM, New York, NY, USA, 2399--2402. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Yashesh Gaur, Walter S. Lasecki, Florian Metze, and Jeffrey P. Bigham. 2016. The Effects of Automatic Speech Recognition Quality on Human Transcription Latency. In Proceedings of the 13th Web for All Conference (W4A '16). ACM, New York, NY, USA, Article 23, 8 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Abraham T. Glasser, Kesavan R. Kushalnagar, and Raja S. Kushalnagar. 2017. Feasibility of Using Automatic Speech Recognition with Voices of Deaf and Hard-of-Hearing Individuals. In Proceedings of the 19th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '17). ACM, New York, NY, USA, 373--374. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Anhong Guo, Xiang Chen, Haoran Qi, Samuel White, Suman Ghosh, Chieko Asakawa, and Jeffrey P Bigham. 2016. Vizlens: A robust and interactive screen reader for interfaces in the real world. In Proceedings of the 29th Annual Symposium on User Interface Software and Technology. ACM, 651--664. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Clarence Virginius Hudgins and Fred Cheffins Numbers. 1942. An investigation of the intelligibility of the speech of the deaf. Genetic Psychology Monographs 25 (1942), 289--392.Google ScholarGoogle Scholar
  14. Youxuan Jiang, Jonathan K Kummerfeld, and Walter S Lasecki. 2017. Understanding Task Design Trade-offs in Crowdsourced Paraphrase Collection. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vol. 2. Vancouver, Canada, 103--109.Google ScholarGoogle ScholarCross RefCross Ref
  15. Daniel Kahneman. 2011. Thinking, fast and slow. Vol. 1. New York: Farrar, Straus and Giroux.Google ScholarGoogle Scholar
  16. Saba Kawas, George Karalis, Tzu Wen, and Richard E. Ladner. 2016. Improving Real-Time Captioning Experiences for Deaf and Hard of Hearing Students. In Proceedings of the 18th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '16). ACM, New York, NY, USA, 15--23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Raja S. Kushalnagar, Walter S. Lasecki, and Jeffrey P. Bigham. 2012. A Readability Evaluation of Real-time Crowd Captions in the Classroom. In Proceedings of the 14th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '12). ACM, New York, NY, USA, 71--78. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Raja S. Kushalnagar, Walter S. Lasecki, and Jeffrey P. Bigham. 2014. Accessibility Evaluation of Classroom Captions. ACM Trans. Access. Comput. 5, 3, Article 7 (Jan. 2014), 24 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Walter Lasecki, Christopher Miller, Adam Sadilek, Andrew Abumoussa, Donato Borrello, Raja Kushalnagar, and Jeffrey Bigham. 2012. Real-time captioning by groups of non-experts. In Proceedings of the 25th annual ACM symposium on User interface software and technology. ACM, 23--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Walter S. Lasecki, Christopher D. Miller, and Jeffrey P. Bigham. 2013. Warping Time for More Effective Real-time Crowdsourcing. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '13). ACM, New York, NY, USA, 2033--2036. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Walter S. Lasecki, Christopher D. Miller, Iftekhar Naim, Raja Kushalnagar, Adam Sadilek, Daniel Gildea, and Jeffrey P. Bigham. 2017. Scribe: Deep Integration of Human and Machine Intelligence to Caption Speech in Real Time. Commun. ACM 60, 9 (Aug. 2017), 93--100. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Walter S Lasecki, Kyle I Murray, Samuel White, Robert C Miller, and Jeffrey P Bigham. 2011. Real-time crowd control of existing interfaces. In Proceedings of the 24th annual ACM symposium on User interface software and technology. ACM, 23--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Walter S Lasecki, Phyo Thiha, Yu Zhong, Erin Brady, and Jeffrey P Bigham. 2013. Answering visual questions with conversational crowd assistants. In Proceedings of the 15th International ACM SIGACCESS Conference on Computers and Accessibility. ACM, 18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Beatrice Liem, Haoqi Zhang, and Yiling Chen. 2011. An Iterative Dual Pathway Structure for Speech-to-Text Transcription. In Human Computation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Greg Little, Lydia B. Chilton, Max Goldman, and Robert C. Miller. 2010. TurKit: Human Computation Algorithms on Mechanical Turk. In Proceedings of the 23Nd Annual ACM Symposium on User Interface Software and Technology (UIST '10). ACM, New York, NY, USA, 57--66. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Alan Lundgard, Yiwei Yang, Maya L. Foster, and Walter S. Lasecki. 2018. Bolt: Instantaneous Crowdsourcing via Just-in-Time Training. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI '18). ACM, New York, NY, USA, Article 467, 7 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Marjorie E Magner. 1972. A speech intelligibility test for deaf children. Clarke School for the Deaf.Google ScholarGoogle Scholar
  28. Nancy S. Mcgarr. 1981. The Effect of Context on the Intelligibility of Hearing and Deaf Children's Speech. Language and Speech 24, 3 (1981), 255--264.Google ScholarGoogle ScholarCross RefCross Ref
  29. Nancy S. McGarr. 1983. The Intelligibility of Deaf Speech to Experienced and Inexperienced Listeners. Journal of Speech, Language, and Hearing Research 26, 3 (1983), 451--458.Google ScholarGoogle ScholarCross RefCross Ref
  30. Ashley Miller, Joan Malasig, Brenda Castro, Vicki L. Hanson, Hugo Nicolau, and Alessandra Brand ao. 2017. The Use of Smart Glasses for Lecture Comprehension by Deaf and Hard of Hearing Students. In Proceedings of the 2017 CHI Conference Extended Abstracts on Human Factors in Computing Systems (CHI EA '17). ACM, New York, NY, USA, 1909--1915. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. R. B. Monsen. 1983. Voice Quality and Speech Intelligibility Among Deaf Children. American Annals of the Deaf 128, 1 (1983), 12--19.Google ScholarGoogle ScholarCross RefCross Ref
  32. Cosmin Munteanu, Gerald Penn, Ron Baecker, Elaine Toms, and David James. 2006. Measuring the acceptable word error rate of machine-generated webcast transcripts. In Intl. Conference on Spoken Language Processing.Google ScholarGoogle Scholar
  33. Mumtaz Begum Mustafa, Fadhilah Rosdi, Siti Salwah Salim, and Muhammad Umair Mughal. 2015. Exploring the influence of general and specific factors on the recognition accuracy of an ASR system for dysarthric speaker. Expert Systems with Applications 42, 8 (2015), 3924 -- 3932. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Mary Joe Osberger and Harry Levitt. 1979. The effect of timing errors on the intelligibility of deaf children's speech. The Journal of the Acoustical Society of America 66, 5 (nov 1979), 1316--1324.Google ScholarGoogle ScholarCross RefCross Ref
  35. Mary Joe Osberger and Nancy S. McGarr. 1982. Speech Production Characteristics of the Hearing Impaired. Speech and Language, Vol. 8. Elsevier, 221 -- 283. http://www.sciencedirect.com/science/article/pii/B9780126086089500139Google ScholarGoogle Scholar
  36. Seyed Reza Shahamiri and Sayan Kumar Ray. 2015. On the use of array learners towards Automatic Speech Recognition for dysarthria. In 2015 IEEE 10th Conference on Industrial Electronics and Applications (ICIEA). IEEE, 1283--1287.Google ScholarGoogle ScholarCross RefCross Ref
  37. R Sriranjani, M Ramasubba Reddy, and S Umesh. 2015. Improved acoustic modeling for automatic dysarthric speech recognition. In 2015 Twenty First National Conference on Communications (NCC). IEEE, 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  38. Yu Zhong, Walter S. Lasecki, Erin Brady, and Jeffrey P. Bigham. 2015. RegionSpeak: Quick Comprehensive Spatial Descriptions of Complex Images for Blind Users. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI '15). ACM, New York, NY, USA, 2353--2362. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Towards More Robust Speech Interactions for Deaf and Hard of Hearing Users

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!