skip to main content
research-article

Video accessibility enhancement for hearing-impaired users

Published:04 November 2011Publication History
Skip Abstract Section

Abstract

There are more than 66 million people suffering from hearing impairment and this disability brings them difficulty in video content understanding due to the loss of audio information. If the scripts are available, captioning technology can help them in a certain degree by synchronously illustrating the scripts during the playing of videos. However, we show that the existing captioning techniques are far from satisfactory in assisting the hearing-impaired audience to enjoy videos. In this article, we introduce a scheme to enhance video accessibility using a Dynamic Captioning approach, which explores a rich set of technologies including face detection and recognition, visual saliency analysis, text-speech alignment, etc. Different from the existing methods that are categorized as static captioning, dynamic captioning puts scripts at suitable positions to help the hearing-impaired audience better recognize the speaking characters. In addition, it progressively highlights the scripts word-by-word via aligning them with the speech signal and illustrates the variation of voice volume. In this way, the special audience can better track the scripts and perceive the moods that are conveyed by the variation of volume. We implemented the technology on 20 video clips and conducted an in-depth study with 60 real hearing-impaired users. The results demonstrated the effectiveness and usefulness of the video accessibility enhancement scheme.

References

  1. Aimera, J. and Wooters, C. 2003. A robust speaker clustering algorithm. In Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding.Google ScholarGoogle Scholar
  2. Arandjelovic, O. and Zisserman, A. 2005. Automatic face recognition for film character retrieval in feature-length films. In Proceedings of the IEEE International Conference on Computer Vision.Google ScholarGoogle Scholar
  3. Arrue, M. and Vigo, M. 2007. Considering web accessibility in information retrieval systems. In Proceedings of the International Conference on Web Engineering. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Azzopardi, L., Glassey, R., Polajnar, M., and Ruthven, I. 2009. Puppyir: Designing an open source framework for interactive information services for children. In Proceedings of the Annual Workshop on Human-Computer Interaction and Information Retrieval.Google ScholarGoogle Scholar
  5. Boyd, J. and Vade, E. 1972. Captioned television for the deaf. Am Ann Hear. Impaired 117, 1, 32--37.Google ScholarGoogle Scholar
  6. Braveman, B. and Hertzog, M. 1980. The effects of caption rate and language level on comprehension of a captioned video presentation. Am Ann Hear. Impaired 125, 7, 943--948.Google ScholarGoogle Scholar
  7. Cox, S., Lincoln, M., Tryggvason, J., Nakisa, J., Wells, M., Tutt, M., and Abbot, S. 2002. Tessa: A system to aid communication with hearing impaired people. In Proceedings of the ACM SIGCAPH Conference on Assistive Technologies. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Cu, M., Chia, L. T., Yi, H., and Rajan, D. 2006. Affective content detection in sitcom using subtitle and audio. In Proceedings of the International Conference on Multi-Media Modeling.Google ScholarGoogle Scholar
  9. Daelons, W. and Bosch, V. 1993. Tabtalk: Reusability in dataoriented grapheme-to-phoneme conversion. In Proceedings of the European Conference on Speech Communication and Technology.Google ScholarGoogle Scholar
  10. Everinghan, M., Siviv, J., and Zisserman, A. 2006. Hello! My name is.. Buffy. Automatic naming of characters in TV videos. In Proceedings of the British Machine Vision Conference.Google ScholarGoogle Scholar
  11. Fajardo, I., Canas, J.-J., Salmeron, L., and Abascal, J. 2006. Improving deaf users accessibility in hyptertext information retrieval: are graphical interfaces useful for them? Behav. Inform. Technol. 26, 6.Google ScholarGoogle Scholar
  12. Fisher, R. A. 1970. Statistical Methods for Research Workers. Macmillan Pub Co.Google ScholarGoogle Scholar
  13. Garrison, W., Long, G., and Dowaliby, F. 1997. Working memory capacity and comprehension processes in hearing impaired reader. J. Hear. Impaired Stud Hear. Impaired Edu. 2, 2, 78--94.Google ScholarGoogle Scholar
  14. Gulliver, S. R. and Ghinea, G. 2003a. How level and type of deafness affect user perception of multimedia video clips. Inform. Soc. J. 2, 4, 374--386.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Gulliver, S. R. and Ghinea, G. 2003b. Impact of captions on hearing impaired and hearing perception of multimedia video clips. In Proceedings of the IEEE International Conference on Multimedia and Expo.Google ScholarGoogle Scholar
  16. Hong, R., Wang, M., Xu, M., Yan, S., and Chua, T.-S. 2010. Dynamic captioning: Video accessibility for hearing impairment. In Proceeding of the 17th ACM International Conference on Multimedia. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Huang, X., Alleva, F., Hon, H., Hwuang, M.-Y., Lee, K.-F., and Rosenfeld, R. 1993. The Sphinx II speech recognition system: An overview. In Computer Speech and Language.Google ScholarGoogle Scholar
  18. Jelinek, L. and Jackson, D. 2001. Television literacy: comprehension of program content using closed captions for the deaf. J. Hear. Impaired Stud Hear. Impaired Educ. 6, 1, 43--53.Google ScholarGoogle Scholar
  19. Juslin, P.-N. and Scherer, K.-R. 2009. Speech emotion analysis. http://www.scholarpedia.org/article/speech emotion analysis.Google ScholarGoogle Scholar
  20. Ma, Y. and Zhang, H. 2003. Contrast-based image attention analysis by using fuzzy growing. In Proceedings of the ACM International Conference on Multimedia. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Moreno, P. J. 1998. A recursive algorithm for the forced alignment of very long audio segments. In Proceedings of the International Conference on Spoken Language Processing.Google ScholarGoogle Scholar
  22. Nielsen, J. 1995. Advances in human-computer interaction. http://en.wikipedia.org/wiki/Deaf 5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Obozinski, G., Taskar, B., and Jordan, M.-I. 2009. Joint covariate selection and joint subspace selection for multiple classification problems. J. Stat. Comput. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Reynolds, D.-A., Quatieri, T.-F., and Dunn, R.-B. 2000. Speaker verification using adapted Gaussian mixture models. In Digital Signal Processing, 14--91.Google ScholarGoogle Scholar
  25. Saenko, K., Liverscu, K., Siracusa, M., Wilson, K., Glass, J., and Darrell, T. 2005. Visual speech recognition with loosely synchronized feature streams. In Proceedings of the IEEE International Conference on Computer Vision. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Stadelmann, T. and Freisleben, B. 2009. Unfolding speaking clustering potential: A biomimetic approach. In Proceeding of the ACM International Conference on Multimedia. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Tang, J., Yan, S., Hong, R., Qi, G.-J., and Chua, T.-S. 2009. Inferring semantic concepts from community-contributed images and noisy tags. In Proceeding of the ACM International Conference on Multimedia. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Tseng, P. 2008. On accelerated proximal gradient methods for convex-concave optimization. SIAM J. Optim.Google ScholarGoogle Scholar
  29. Viola, P. and Jones, M. 2001. Rapid object detection using a boosted cascaded of simple features. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle Scholar
  30. Wan, V. and Campbell, W.-M. 2000. Support vector machines for speaker verification and identification. In Proc. IEEE.Google ScholarGoogle Scholar
  31. Wang, M., Hua, X.-S., Hong, R., Tang, J., Qi, G.-J., and Song, Y. 2009c. Unified video annotation via multi-graph learning. IEEE Trans. Circuit, Syst. Video Technol. 19, 5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Wang, M., Hua, X.-S., Tang, J., and Hong, R. 2009b. Beyond distance measurement: Constructing neighborhood similarity for video annotation. IEEE Trans. Multimedia 11, 3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Wang, M., Liu, B., and Hua, X.-S. 2009c. Accessible image search. In Proceedings of the ACM Multimedia. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Wang, M. and Zhang, H. 2009. Video content structuring. Scholarpedia 4, 8, 9431.Google ScholarGoogle ScholarCross RefCross Ref
  35. Wright, J., Yang, A., Ganesh, A., Sastry, S., and Ma, Y. 2009. Robust face recognition via sparse representation. In Proceedings of the IEEE trans. on Pattern Analysis and Machine Intelligence 31(2), 210--227. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Xu, M., Xu, J.-S., Luo, S., and Duan, L. 2008. Hierarchical movie affective content analysis based on arousal and valence feature. In Proceedings of the ACM International Conference on Multimedia. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Yang, T., Pan, Q., Li, J., and Li, S. 2005. Real-time multiple objects tracking with occlusion handling in dynamic scenes. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Video accessibility enhancement for hearing-impaired users

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM Transactions on Multimedia Computing, Communications, and Applications
              ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 7S, Issue 1
              Special section on ACM multimedia 2010 best paper candidates, and issue on social media
              October 2011
              246 pages
              ISSN:1551-6857
              EISSN:1551-6865
              DOI:10.1145/2037676
              Issue’s Table of Contents

              Copyright © 2011 ACM

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 4 November 2011
              • Accepted: 1 July 2011
              • Revised: 1 May 2011
              • Received: 1 February 2011
              Published in tomm Volume 7S, Issue 1

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article
              • Research
              • Refereed

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader
            About Cookies On This Site

            We use cookies to ensure that we give you the best experience on our website.

            Learn more

            Got it!