skip to main content
research-article

Near-lossless semantic video summarization and its applications to video analysis

Published:03 July 2013Publication History
Skip Abstract Section

Abstract

The ever increasing volume of video content on the Web has created profound challenges for developing efficient indexing and search techniques to manage video data. Conventional techniques such as video compression and summarization strive for the two commonly conflicting goals of low storage and high visual and semantic fidelity. With the goal of balancing both video compression and summarization, this article presents a novel approach, called Near-Lossless Semantic Summarization (NLSS), to summarize a video stream with the least high-level semantic information loss by using an extremely small piece of metadata. The summary consists of compressed image and audio streams, as well as the metadata for temporal structure and motion information. Although at a very low compression rate (around 1/40 of H.264 baseline, where traditional compression techniques can hardly preserve an acceptable visual fidelity), the proposed NLSS still can be applied to many video-oriented tasks, such as visualization, indexing and browsing, duplicate detection, concept detection, and so on. We evaluate the NLSS on TRECVID and other video collections, and demonstrate that it is a powerful tool for significantly reducing storage consumption, while keeping high-level semantic fidelity.

References

  1. Amr. 2002. AMR speech codec; general description. TS 26.071 version 5.0.0. http://www.3gpp.org/ftp/Specs/html-info/26071.htm.Google ScholarGoogle Scholar
  2. Bescos, J., Martinez, J. M., Herranz, L., and Tiburzi, F. 2007. Content-driven adaptation of on-line video. Signal Process. Image Comm. 22, 7-8, 651--668. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bing. 2013. http://www.bing.com/?scope=video/.Google ScholarGoogle Scholar
  4. Boreczky, J., Girgensohn, A., Golovchinsky, G., and Uchihashi, S. 2000. An interactive comic book presentation for exploring video. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 185--192. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Bouthemy, P., Gelgon, M., and Ganansia, F. 1999. A unified approach to shot change detection and camera motion characterization. IEEE Trans. Circ. Syst. Video Technol. 9, 7, 1030--1044. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Cc Web Video. Near-duplicate web video dataset. http://vireo.cs.cityu.edu.hk/webvideo/.Google ScholarGoogle Scholar
  7. Covell, M., Withgott, M., and Slaney, M. 1998. Mach1: Nonuniform time-scale modification of speech. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing.Google ScholarGoogle Scholar
  8. Fernando, W. A. C., Canagarajah, C. N., and Bull, D. R. 1999. Automatic detection of fade-in and fade-out in video sequences. In Proceedings of the International Symposium on Circuits and Systems. Vol. 4. 255--258.Google ScholarGoogle Scholar
  9. Google. 2013. http://video.google.com/.Google ScholarGoogle Scholar
  10. H263. 2000. ITU-T Rec. H.263, Video coding for low bit rate communication. http://www.itu.int/rec/T-REC-H.263-200501-I.Google ScholarGoogle Scholar
  11. H264. 2003. ITU-T Rec. H.264—ISO/IEC 14496-10 avc, draft itu-t recommendation and final draft international standard of joint video specification. http://www.itu.int/ITU-T/recommendations/rec.aspx?rec=11466.Google ScholarGoogle Scholar
  12. Hampapur, A., Hyun, K., and Bolle, R. M. 2002. Comparison of sequence matching techniques for video copy detection. Proc. SPIE 4676, 194--201.Google ScholarGoogle Scholar
  13. Hauptmann, A. G., Christel, M. G., Lin, W.-H., Maher, B., Yang, J., Et Al. 2007. Clever clustering vs. simple speed-up for summarizing rushes. In Proceedings of the International Workshop on TRECVID Video Summarization. 20--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Hsu, W. H., Kennedy, L. S., and Chang, S.-F. 2007. Reranking methods for visual search. IEEE Multimedia 14, 3, 14--22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Irani, M. and Anandan, P. 1998. Video indexing based on mosaic representations. Proc. IEEE 86, 5, 905--921.Google ScholarGoogle ScholarCross RefCross Ref
  16. Iso/Iec. 1991. Digital compression and coding of continuous still images, part 1: Requirements and guidelines. ISO/IEC JTC1 Draft International Standard 10918-1.Google ScholarGoogle Scholar
  17. Jiang, W., Cotton, C. V., Chang, S.-F., Ellis, D., and Loui, A. C. 2010a. Audio-visual atoms for generic video concept classification. ACM Trans. Multimedia Comput. Comm. Appl. 6, 3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Jiang, Y.-G., Yang, J., Ngo, C.-W., and Hauptmann, A. G. 2010b. Representations of keypoint-based semantic concept detection: A comprehensive study. IEEE Trans. Multimedia 12, 1, 42--53. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Kim, C. and Hwang, J.-N. 2002. Object-based video abstraction for video surveillance systems. IEEE Trans. Circ. Syst. Video Technol. 12, 12, 1128--1138. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Kim, J. G., Chang, H. S., Kim, J., and Kim, H. M. 2000. Efficient camera motion characterization for mpeg video indexing. In Proceedings of the IEEE International Conference on Multimedia and Expo. 1171--1174.Google ScholarGoogle Scholar
  21. Konrad, J. and Dufaux, F. 1998. Improved global motion estimation for n3. ISO/IEC JTC1/SC29/WG11 M3096.Google ScholarGoogle Scholar
  22. Lew, M. S., Sebe, N., Djeraba, C., and Jain, R. 2006. Content-based multimedia information retrieval: State of the art and challenges. ACM Trans. Multimedia Comput. Comm. Appl. 2, 1, 1--19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Li, Y., Jin, J., and Zhou, X. 2005. Video matching using binary signature. In Proceedings of the International Symposium on Intelligent Signal Processing and Communication Systems. 317--320.Google ScholarGoogle Scholar
  24. Liu, Y., Mei, T., and Hua, X.-S. 2009. CrowdReranking: Exploring multiple search engines for visual search reranking. In Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval. 500--507. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Lowe, D. 2004. Distinctive image features from scale-invariant key points. Int. J. Comput. Vis. 60, 2, 91--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Lu, L., Zhang, H.-J., and Li, S. Z. 2003. Content-based audio classification and segmentation by using support vector machines. Multimedia Syst. 8, 482--492.Google ScholarGoogle ScholarCross RefCross Ref
  27. Ma, Y.-F., Lu, L., Zhang, H.-J., and Li, M. 2002. A user attention model for video summarization. In Proceedings of the ACM International Conference on Multimedia. 533--542. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Mei, T. and Hua, X.-S. 2008. Structure and event mining in sports video with efficient mosaic. Multimedia Tools Appl. 40, 1, 89--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Mei, T., Hua, X.-S., Lai, W., Yang, L., Zha, Z., Et Al. 2007a. MSRA-USTC-SJTU at TRECVID 2007: High-level feature extraction and search. In TREC Video Retrieval Evaluation Online Proceedings.Google ScholarGoogle Scholar
  30. Mei, T., Hua, X.-S., and Li, S. 2009a. VideoSense: A contextual in-video advertising system. IEEE Trans. Circ. Syst. Video Technol. 19, 12, 1866--1879. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Mei, T., Hua, X.-S., Zhu, C.-Z., Zhou, H.-Q., and Li, S. 2007b. Home video visual quality assessment with spatiotemporal factors. IEEE Trans. Circ. Syst. Video Technol. 17, 6, 699--706. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Mei, T., Yang, B., Yang, S.-Q., and Hua, X.-S. 2009b. Video collage: Presenting a video sequence using a single image. Vis. Comput. 25, 1, 39--51. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Moxley, E., Mei, T., and Manjunath, B. S. 2010. Video annotation through search and graph reinforcement mining. IEEE Trans. Multimedia 12, 3, 184--193. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Mpeg-2. Mpeg-2 video group, information technology - generic coding of moving pictures and associated audio: Part 2—video. ISO/IEC 13818-2.Google ScholarGoogle Scholar
  35. Mpeg-4. Mpeg-4 video group, generic coding of audio-visual objects: Part 2—visual. ISO/IEC JTC1/SC29/WG11 N1902, FDIS of ISO/IEC 14 496-2.Google ScholarGoogle Scholar
  36. Naphade, M., Smith, J. R., Tesic, J., Chang, S.-F., Hsu, W., Kennedy, L., Hauptmann, A., and Curtis, J. 2006. Largescale concept ontology for multimedia. IEEE Multimedia 13, 3, 86--91. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Nitta, N., Takahashi, Y., and Babaguchi, N. 2009. Automatic personalized video abstraction for sports videos using metadata. Multimedia Tools Appl. 41, 1, 1--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Over, P., Smeaton, A. F., and Awad, G. 2008. The TRECVid 2008 BBC rushes summarization evaluation. In Proceedings of the ACM TRECVid Video Summarization Workshop. 1--20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Paisitkriangkrai, S., Mei, T., Zhang, J., and Hua, X.-S. 2010. Scalable clip-based near-duplicate video detection with ordinal measure. In Proceedings of the ACM International Conference on Image and Video Retrieval. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Shao, X., Xu, C., Maddage, N. C., Tian, Q., Kankanhalli, M. S., and Jin, J. S. 2006. Automatic summarization of music videos. ACM Trans. Multimedia Comput. Comm. Appl. 2, 2, 127--148. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Smeulders, A. W. M., Worring, M., Santini, S., Gupta, A., and Jain, R. 2000. Content-based image retrieval at the end of the early years. IEEE Trans. Pattern Anal. Mach. Intell. 22, 12, 1349--1380. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Snoek, C. G. M., Worring, M., Van Gemert, J. C., Geusebroek, J.-M., and Smeulders, A. W. M. 2006. The challenge problem for automated detection of 101 semantic concepts in multimedia. In Proceedings of the ACM International Conference on Multimedia. 421--430. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Tang, L.-X., Mei, T., and Hua, X.-S. 2009. Near-lossless video summarization. In Proceedings of the ACM International Conference on Multimedia. 351--360. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Tjondronegoro, D., Chen, Y.-P. P., and Pham, B. 2003. Sports video summarization using highlights and play-breaks. In Proceedings of the ACM SIGMM International Workshop on Multimedia Information Retrieval. 201--208. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Trecvid. 2013. http://www-nlpir.nist.gov/projects/trecvid/.Google ScholarGoogle Scholar
  46. Truong, B. T. and Venkatesh, S. 2007. Video abstraction: A systematic review and classification. ACM Trans.Multimedia Comput. Comm. Appl. 3, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Tse, T., Marchionini, G., Ding, W., Slaughter, L., and Komlodi, A. 1998. Dynamic key frame presentation techniques for augmenting video browsing. In Proceedings of the Working Conference on Advanced Visual Interfaces. 185--194. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Wang, Y., Mei, T., and Hua, X.-S. 2011. Community discovery from movie and its application to poster generation. In Proceedings of the International Multimedia Modeling Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Wiegand, T., Sullivan, G. J., Bjontegaard, G., and Luthra, A. 2003. Overview of the H.264/AVC video coding standard. IEEE Trans. Circ. Syst. Video Technol. 13, 7, 560--576. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Wu, X., Ngo, C.-W., Hauptmann, A. G., and Tan, H.-K. 2009. Real-time near-duplicate elimination for web video search with content and context. IEEE Trans. Multimedia 11, 2, 196--207. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Zhang, H.-J., Kankanhalli, A., and Smoliar, S. W. 1993. Automatic partitioning of full-motion video. Multimedia Syst. 1, 1, 10--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Zhao, W.-L. and Ngo, C.-W. 2009. Scale-rotation invariant pattern entropy for keypoint-based near- duplicate detection. IEEE Trans. Image Process. 18, 2, 412--423. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Zhao, W.-L., Ngo, C.-W., Tan, H.-K., and Wu, X. 2007. Near-duplicate keyframe identification with interest point matching and pattern learning. IEEE Trans. Multimedia 9, 5, 1037--1048. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Near-lossless semantic video summarization and its applications to video analysis

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM Transactions on Multimedia Computing, Communications, and Applications
              ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 9, Issue 3
              June 2013
              121 pages
              ISSN:1551-6857
              EISSN:1551-6865
              DOI:10.1145/2487268
              Issue’s Table of Contents

              Copyright © 2013 ACM

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 3 July 2013
              • Accepted: 1 October 2012
              • Revised: 1 August 2012
              • Received: 1 February 2012
              Published in tomm Volume 9, Issue 3

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article
              • Research
              • Refereed

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader
            About Cookies On This Site

            We use cookies to ensure that we give you the best experience on our website.

            Learn more

            Got it!