ABSTRACT

Mobile, wearable, and other ubiquitous computing devices are increasingly creating a context in which conventional keyboard and screen-based inputs are being replaced in favor of more natural speech-based interactions. Digital personal assistants use speech to control a wide range of functionality, from environmental controls to information access. However, many deaf and hard-of-hearing users have speech patterns that vary from those of hearing users due to incomplete acoustic feedback from their own voices. Because automatic speech recognition (ASR) systems are largely trained using speech from hearing individuals, speech-controlled technologies are typically inaccessible to deaf users. Prior work has focused on providing deaf users access to aural output via real-time captioning or signing, but little has been done to improve users' ability to provide input to these systems' speech-based interfaces. Further, the vocalization patterns of deaf speech often make accurate recognition intractable for both automated systems and human listeners, making traditional approaches to mitigate ASR limitations, such as human captionists, less effective. To bridge this accessibility gap, we investigate the limitations of common speech recognition approaches and techniques---both automatic and human-powered---when applied to deaf speech. We then explore the effectiveness of an iterative crowdsourcing workflow, and characterize the potential for groups to collectively exceed the performance of individuals. This paper contributes a better understanding of the challenges of deaf speech recognition and provides insights for future system development in this space.
References
- Michael S. Bernstein, Joel Brandt, Robert C. Miller, and David R. Karger. 2011. Crowds in Two Seconds: Enabling Realtime Crowd-powered Interfaces. In Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology (UIST '11). ACM, New York, NY, USA, 33--42. Google Scholar
Digital Library
- Jeffrey P. Bigham, Chandrika Jayant, Hanjie Ji, Greg Little, Andrew Miller, Robert C. Miller, Robin Miller, Aubrey Tatarowicz, Brandyn White, Samual White, and Tom Yeh. 2010. VizWiz: Nearly Real-time Answers to Visual Questions. In Proceedings of the 23Nd Annual ACM Symposium on User Interface Software and Technology (UIST '10). ACM, New York, NY, USA, 333--342. Google Scholar
Digital Library
- Jeffrey P. Bigham, Raja Kushalnagar, Ting-Hao Kenneth Huang, Juan Pablo Flores, and Saiph Savage. 2017. On How Deaf People Might Use Speech to Control Devices. In Proceedings of the 19th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '17). ACM, New York, NY, USA, 383--384. Google Scholar
Digital Library
- Jeffrey P. Bigham, Richard E. Ladner, and Yevgen Borodin. 2011. The Design of Human-powered Access Technology. In The Proceedings of the 13th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '11). ACM, New York, NY, USA, 3--10. Google Scholar
Digital Library
- Chris Callison-Burch. 2009. Fast, Cheap, and Creative: Evaluating Translation Quality Using Amazon's Mechanical Turk. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 (EMNLP '09). Association for Computational Linguistics, Stroudsburg, PA, USA, 286--295. http://dl.acm.org/citation.cfm?id=1699510.1699548 Google Scholar
Digital Library
- Chris Callison-Burch and Mark Dredze. 2010. Creating Speech and Language Data with Amazon's Mechanical Turk. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk (CSLDAMT '10). Association for Computational Linguistics, Stroudsburg, PA, USA, 1--12. http://dl.acm.org/citation.cfm?id=1866696.1866697 Google Scholar
Digital Library
- CNET. 2017. The complete list of Alexa commands so far. (2017). https://www.cnet.com/how-to/amazon-echo-the-complete-list-of-alexa-commands/Google Scholar
- Stephen Cox, Michael Lincoln, Judy Tryggvason, Melanie Nakisa, Mark Wells, Marcus Tutt, and Sanja Abbott. 2002. Tessa, a System to Aid Communication with Deaf People. In Proceedings of the Fifth International ACM Conference on Assistive Technologies (Assets '02). ACM, New York, NY, USA, 205--212. Google Scholar
Digital Library
- Julie S. Downs, Mandy B. Holbrook, Steve Sheng, and Lorrie Faith Cranor. 2010. Are Your Participants Gaming the System?: Screening Mechanical Turk Workers. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '10). ACM, New York, NY, USA, 2399--2402. Google Scholar
Digital Library
- Yashesh Gaur, Walter S. Lasecki, Florian Metze, and Jeffrey P. Bigham. 2016. The Effects of Automatic Speech Recognition Quality on Human Transcription Latency. In Proceedings of the 13th Web for All Conference (W4A '16). ACM, New York, NY, USA, Article 23, 8 pages. Google Scholar
Digital Library
- Abraham T. Glasser, Kesavan R. Kushalnagar, and Raja S. Kushalnagar. 2017. Feasibility of Using Automatic Speech Recognition with Voices of Deaf and Hard-of-Hearing Individuals. In Proceedings of the 19th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '17). ACM, New York, NY, USA, 373--374. Google Scholar
Digital Library
- Anhong Guo, Xiang Chen, Haoran Qi, Samuel White, Suman Ghosh, Chieko Asakawa, and Jeffrey P Bigham. 2016. Vizlens: A robust and interactive screen reader for interfaces in the real world. In Proceedings of the 29th Annual Symposium on User Interface Software and Technology. ACM, 651--664. Google Scholar
Digital Library
- Clarence Virginius Hudgins and Fred Cheffins Numbers. 1942. An investigation of the intelligibility of the speech of the deaf. Genetic Psychology Monographs 25 (1942), 289--392.Google Scholar
- Youxuan Jiang, Jonathan K Kummerfeld, and Walter S Lasecki. 2017. Understanding Task Design Trade-offs in Crowdsourced Paraphrase Collection. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vol. 2. Vancouver, Canada, 103--109.Google Scholar
Cross Ref
- Daniel Kahneman. 2011. Thinking, fast and slow. Vol. 1. New York: Farrar, Straus and Giroux.Google Scholar
- Saba Kawas, George Karalis, Tzu Wen, and Richard E. Ladner. 2016. Improving Real-Time Captioning Experiences for Deaf and Hard of Hearing Students. In Proceedings of the 18th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '16). ACM, New York, NY, USA, 15--23. Google Scholar
Digital Library
- Raja S. Kushalnagar, Walter S. Lasecki, and Jeffrey P. Bigham. 2012. A Readability Evaluation of Real-time Crowd Captions in the Classroom. In Proceedings of the 14th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '12). ACM, New York, NY, USA, 71--78. Google Scholar
Digital Library
- Raja S. Kushalnagar, Walter S. Lasecki, and Jeffrey P. Bigham. 2014. Accessibility Evaluation of Classroom Captions. ACM Trans. Access. Comput. 5, 3, Article 7 (Jan. 2014), 24 pages. Google Scholar
Digital Library
- Walter Lasecki, Christopher Miller, Adam Sadilek, Andrew Abumoussa, Donato Borrello, Raja Kushalnagar, and Jeffrey Bigham. 2012. Real-time captioning by groups of non-experts. In Proceedings of the 25th annual ACM symposium on User interface software and technology. ACM, 23--34. Google Scholar
Digital Library
- Walter S. Lasecki, Christopher D. Miller, and Jeffrey P. Bigham. 2013. Warping Time for More Effective Real-time Crowdsourcing. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '13). ACM, New York, NY, USA, 2033--2036. Google Scholar
Digital Library
- Walter S. Lasecki, Christopher D. Miller, Iftekhar Naim, Raja Kushalnagar, Adam Sadilek, Daniel Gildea, and Jeffrey P. Bigham. 2017. Scribe: Deep Integration of Human and Machine Intelligence to Caption Speech in Real Time. Commun. ACM 60, 9 (Aug. 2017), 93--100. Google Scholar
Digital Library
- Walter S Lasecki, Kyle I Murray, Samuel White, Robert C Miller, and Jeffrey P Bigham. 2011. Real-time crowd control of existing interfaces. In Proceedings of the 24th annual ACM symposium on User interface software and technology. ACM, 23--32. Google Scholar
Digital Library
- Walter S Lasecki, Phyo Thiha, Yu Zhong, Erin Brady, and Jeffrey P Bigham. 2013. Answering visual questions with conversational crowd assistants. In Proceedings of the 15th International ACM SIGACCESS Conference on Computers and Accessibility. ACM, 18. Google Scholar
Digital Library
- Beatrice Liem, Haoqi Zhang, and Yiling Chen. 2011. An Iterative Dual Pathway Structure for Speech-to-Text Transcription. In Human Computation. Google Scholar
Digital Library
- Greg Little, Lydia B. Chilton, Max Goldman, and Robert C. Miller. 2010. TurKit: Human Computation Algorithms on Mechanical Turk. In Proceedings of the 23Nd Annual ACM Symposium on User Interface Software and Technology (UIST '10). ACM, New York, NY, USA, 57--66. Google Scholar
Digital Library
- Alan Lundgard, Yiwei Yang, Maya L. Foster, and Walter S. Lasecki. 2018. Bolt: Instantaneous Crowdsourcing via Just-in-Time Training. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI '18). ACM, New York, NY, USA, Article 467, 7 pages. Google Scholar
Digital Library
- Marjorie E Magner. 1972. A speech intelligibility test for deaf children. Clarke School for the Deaf.Google Scholar
- Nancy S. Mcgarr. 1981. The Effect of Context on the Intelligibility of Hearing and Deaf Children's Speech. Language and Speech 24, 3 (1981), 255--264.Google Scholar
Cross Ref
- Nancy S. McGarr. 1983. The Intelligibility of Deaf Speech to Experienced and Inexperienced Listeners. Journal of Speech, Language, and Hearing Research 26, 3 (1983), 451--458.Google Scholar
Cross Ref
- Ashley Miller, Joan Malasig, Brenda Castro, Vicki L. Hanson, Hugo Nicolau, and Alessandra Brand ao. 2017. The Use of Smart Glasses for Lecture Comprehension by Deaf and Hard of Hearing Students. In Proceedings of the 2017 CHI Conference Extended Abstracts on Human Factors in Computing Systems (CHI EA '17). ACM, New York, NY, USA, 1909--1915. Google Scholar
Digital Library
- R. B. Monsen. 1983. Voice Quality and Speech Intelligibility Among Deaf Children. American Annals of the Deaf 128, 1 (1983), 12--19.Google Scholar
Cross Ref
- Cosmin Munteanu, Gerald Penn, Ron Baecker, Elaine Toms, and David James. 2006. Measuring the acceptable word error rate of machine-generated webcast transcripts. In Intl. Conference on Spoken Language Processing.Google Scholar
- Mumtaz Begum Mustafa, Fadhilah Rosdi, Siti Salwah Salim, and Muhammad Umair Mughal. 2015. Exploring the influence of general and specific factors on the recognition accuracy of an ASR system for dysarthric speaker. Expert Systems with Applications 42, 8 (2015), 3924 -- 3932. Google Scholar
Digital Library
- Mary Joe Osberger and Harry Levitt. 1979. The effect of timing errors on the intelligibility of deaf children's speech. The Journal of the Acoustical Society of America 66, 5 (nov 1979), 1316--1324.Google Scholar
Cross Ref
- Mary Joe Osberger and Nancy S. McGarr. 1982. Speech Production Characteristics of the Hearing Impaired. Speech and Language, Vol. 8. Elsevier, 221 -- 283. http://www.sciencedirect.com/science/article/pii/B9780126086089500139Google Scholar
- Seyed Reza Shahamiri and Sayan Kumar Ray. 2015. On the use of array learners towards Automatic Speech Recognition for dysarthria. In 2015 IEEE 10th Conference on Industrial Electronics and Applications (ICIEA). IEEE, 1283--1287.Google Scholar
Cross Ref
- R Sriranjani, M Ramasubba Reddy, and S Umesh. 2015. Improved acoustic modeling for automatic dysarthric speech recognition. In 2015 Twenty First National Conference on Communications (NCC). IEEE, 1--6.Google Scholar
Cross Ref
- Yu Zhong, Walter S. Lasecki, Erin Brady, and Jeffrey P. Bigham. 2015. RegionSpeak: Quick Comprehensive Spatial Descriptions of Complex Images for Blind Users. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI '15). ACM, New York, NY, USA, 2353--2362. Google Scholar
Digital Library
Index Terms
Towards More Robust Speech Interactions for Deaf and Hard of Hearing Users





Comments