Abstract
There are more than 66 million people suffering from hearing impairment and this disability brings them difficulty in video content understanding due to the loss of audio information. If the scripts are available, captioning technology can help them in a certain degree by synchronously illustrating the scripts during the playing of videos. However, we show that the existing captioning techniques are far from satisfactory in assisting the hearing-impaired audience to enjoy videos. In this article, we introduce a scheme to enhance video accessibility using a Dynamic Captioning approach, which explores a rich set of technologies including face detection and recognition, visual saliency analysis, text-speech alignment, etc. Different from the existing methods that are categorized as static captioning, dynamic captioning puts scripts at suitable positions to help the hearing-impaired audience better recognize the speaking characters. In addition, it progressively highlights the scripts word-by-word via aligning them with the speech signal and illustrates the variation of voice volume. In this way, the special audience can better track the scripts and perceive the moods that are conveyed by the variation of volume. We implemented the technology on 20 video clips and conducted an in-depth study with 60 real hearing-impaired users. The results demonstrated the effectiveness and usefulness of the video accessibility enhancement scheme.
- Aimera, J. and Wooters, C. 2003. A robust speaker clustering algorithm. In Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding.Google Scholar
- Arandjelovic, O. and Zisserman, A. 2005. Automatic face recognition for film character retrieval in feature-length films. In Proceedings of the IEEE International Conference on Computer Vision.Google Scholar
- Arrue, M. and Vigo, M. 2007. Considering web accessibility in information retrieval systems. In Proceedings of the International Conference on Web Engineering. Google Scholar
Digital Library
- Azzopardi, L., Glassey, R., Polajnar, M., and Ruthven, I. 2009. Puppyir: Designing an open source framework for interactive information services for children. In Proceedings of the Annual Workshop on Human-Computer Interaction and Information Retrieval.Google Scholar
- Boyd, J. and Vade, E. 1972. Captioned television for the deaf. Am Ann Hear. Impaired 117, 1, 32--37.Google Scholar
- Braveman, B. and Hertzog, M. 1980. The effects of caption rate and language level on comprehension of a captioned video presentation. Am Ann Hear. Impaired 125, 7, 943--948.Google Scholar
- Cox, S., Lincoln, M., Tryggvason, J., Nakisa, J., Wells, M., Tutt, M., and Abbot, S. 2002. Tessa: A system to aid communication with hearing impaired people. In Proceedings of the ACM SIGCAPH Conference on Assistive Technologies. Google Scholar
Digital Library
- Cu, M., Chia, L. T., Yi, H., and Rajan, D. 2006. Affective content detection in sitcom using subtitle and audio. In Proceedings of the International Conference on Multi-Media Modeling.Google Scholar
- Daelons, W. and Bosch, V. 1993. Tabtalk: Reusability in dataoriented grapheme-to-phoneme conversion. In Proceedings of the European Conference on Speech Communication and Technology.Google Scholar
- Everinghan, M., Siviv, J., and Zisserman, A. 2006. Hello! My name is.. Buffy. Automatic naming of characters in TV videos. In Proceedings of the British Machine Vision Conference.Google Scholar
- Fajardo, I., Canas, J.-J., Salmeron, L., and Abascal, J. 2006. Improving deaf users accessibility in hyptertext information retrieval: are graphical interfaces useful for them? Behav. Inform. Technol. 26, 6.Google Scholar
- Fisher, R. A. 1970. Statistical Methods for Research Workers. Macmillan Pub Co.Google Scholar
- Garrison, W., Long, G., and Dowaliby, F. 1997. Working memory capacity and comprehension processes in hearing impaired reader. J. Hear. Impaired Stud Hear. Impaired Edu. 2, 2, 78--94.Google Scholar
- Gulliver, S. R. and Ghinea, G. 2003a. How level and type of deafness affect user perception of multimedia video clips. Inform. Soc. J. 2, 4, 374--386.Google Scholar
Digital Library
- Gulliver, S. R. and Ghinea, G. 2003b. Impact of captions on hearing impaired and hearing perception of multimedia video clips. In Proceedings of the IEEE International Conference on Multimedia and Expo.Google Scholar
- Hong, R., Wang, M., Xu, M., Yan, S., and Chua, T.-S. 2010. Dynamic captioning: Video accessibility for hearing impairment. In Proceeding of the 17th ACM International Conference on Multimedia. Google Scholar
Digital Library
- Huang, X., Alleva, F., Hon, H., Hwuang, M.-Y., Lee, K.-F., and Rosenfeld, R. 1993. The Sphinx II speech recognition system: An overview. In Computer Speech and Language.Google Scholar
- Jelinek, L. and Jackson, D. 2001. Television literacy: comprehension of program content using closed captions for the deaf. J. Hear. Impaired Stud Hear. Impaired Educ. 6, 1, 43--53.Google Scholar
- Juslin, P.-N. and Scherer, K.-R. 2009. Speech emotion analysis. http://www.scholarpedia.org/article/speech emotion analysis.Google Scholar
- Ma, Y. and Zhang, H. 2003. Contrast-based image attention analysis by using fuzzy growing. In Proceedings of the ACM International Conference on Multimedia. Google Scholar
Digital Library
- Moreno, P. J. 1998. A recursive algorithm for the forced alignment of very long audio segments. In Proceedings of the International Conference on Spoken Language Processing.Google Scholar
- Nielsen, J. 1995. Advances in human-computer interaction. http://en.wikipedia.org/wiki/Deaf 5. Google Scholar
Digital Library
- Obozinski, G., Taskar, B., and Jordan, M.-I. 2009. Joint covariate selection and joint subspace selection for multiple classification problems. J. Stat. Comput. Google Scholar
Digital Library
- Reynolds, D.-A., Quatieri, T.-F., and Dunn, R.-B. 2000. Speaker verification using adapted Gaussian mixture models. In Digital Signal Processing, 14--91.Google Scholar
- Saenko, K., Liverscu, K., Siracusa, M., Wilson, K., Glass, J., and Darrell, T. 2005. Visual speech recognition with loosely synchronized feature streams. In Proceedings of the IEEE International Conference on Computer Vision. Google Scholar
Digital Library
- Stadelmann, T. and Freisleben, B. 2009. Unfolding speaking clustering potential: A biomimetic approach. In Proceeding of the ACM International Conference on Multimedia. Google Scholar
Digital Library
- Tang, J., Yan, S., Hong, R., Qi, G.-J., and Chua, T.-S. 2009. Inferring semantic concepts from community-contributed images and noisy tags. In Proceeding of the ACM International Conference on Multimedia. Google Scholar
Digital Library
- Tseng, P. 2008. On accelerated proximal gradient methods for convex-concave optimization. SIAM J. Optim.Google Scholar
- Viola, P. and Jones, M. 2001. Rapid object detection using a boosted cascaded of simple features. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition.Google Scholar
- Wan, V. and Campbell, W.-M. 2000. Support vector machines for speaker verification and identification. In Proc. IEEE.Google Scholar
- Wang, M., Hua, X.-S., Hong, R., Tang, J., Qi, G.-J., and Song, Y. 2009c. Unified video annotation via multi-graph learning. IEEE Trans. Circuit, Syst. Video Technol. 19, 5. Google Scholar
Digital Library
- Wang, M., Hua, X.-S., Tang, J., and Hong, R. 2009b. Beyond distance measurement: Constructing neighborhood similarity for video annotation. IEEE Trans. Multimedia 11, 3. Google Scholar
Digital Library
- Wang, M., Liu, B., and Hua, X.-S. 2009c. Accessible image search. In Proceedings of the ACM Multimedia. Google Scholar
Digital Library
- Wang, M. and Zhang, H. 2009. Video content structuring. Scholarpedia 4, 8, 9431.Google Scholar
Cross Ref
- Wright, J., Yang, A., Ganesh, A., Sastry, S., and Ma, Y. 2009. Robust face recognition via sparse representation. In Proceedings of the IEEE trans. on Pattern Analysis and Machine Intelligence 31(2), 210--227. Google Scholar
Digital Library
- Xu, M., Xu, J.-S., Luo, S., and Duan, L. 2008. Hierarchical movie affective content analysis based on arousal and valence feature. In Proceedings of the ACM International Conference on Multimedia. Google Scholar
Digital Library
- Yang, T., Pan, Q., Li, J., and Li, S. 2005. Real-time multiple objects tracking with occlusion handling in dynamic scenes. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. Google Scholar
Digital Library
Index Terms
Video accessibility enhancement for hearing-impaired users
Recommendations
Dynamic captioning: video accessibility enhancement for hearing impairment
MM '10: Proceedings of the 18th ACM international conference on MultimediaThere are more than 66 million people su®ering from hearing impairment and this disability brings them di±culty in the video content understanding due to the loss of audio information. If scripts are available, captioning technology can help them in a ...
Impact of web accessibility barriers on users with hearing impairment
Interacción '14: Proceedings of the XV International Conference on Human Computer InteractionSeveral users tests were carried out with people with a hearing impairment to evaluate the impact of different web accessibility barriers on two similar web sites, one accessible and the other not accessible. The tests focus was to analyze user's mood ...
What Makes Videos Accessible to Blind and Visually Impaired People?
CHI '21: Proceedings of the 2021 CHI Conference on Human Factors in Computing SystemsUser-generated videos are an increasingly important source of information online, yet most online videos are inaccessible to blind and visually impaired (BVI) people. To find videos that are accessible, or understandable without additional description ...






Comments