Abstract
The ever increasing volume of video content on the Web has created profound challenges for developing efficient indexing and search techniques to manage video data. Conventional techniques such as video compression and summarization strive for the two commonly conflicting goals of low storage and high visual and semantic fidelity. With the goal of balancing both video compression and summarization, this article presents a novel approach, called Near-Lossless Semantic Summarization (NLSS), to summarize a video stream with the least high-level semantic information loss by using an extremely small piece of metadata. The summary consists of compressed image and audio streams, as well as the metadata for temporal structure and motion information. Although at a very low compression rate (around 1/40 of H.264 baseline, where traditional compression techniques can hardly preserve an acceptable visual fidelity), the proposed NLSS still can be applied to many video-oriented tasks, such as visualization, indexing and browsing, duplicate detection, concept detection, and so on. We evaluate the NLSS on TRECVID and other video collections, and demonstrate that it is a powerful tool for significantly reducing storage consumption, while keeping high-level semantic fidelity.
- Amr. 2002. AMR speech codec; general description. TS 26.071 version 5.0.0. http://www.3gpp.org/ftp/Specs/html-info/26071.htm.Google Scholar
- Bescos, J., Martinez, J. M., Herranz, L., and Tiburzi, F. 2007. Content-driven adaptation of on-line video. Signal Process. Image Comm. 22, 7-8, 651--668. Google Scholar
Digital Library
- Bing. 2013. http://www.bing.com/?scope=video/.Google Scholar
- Boreczky, J., Girgensohn, A., Golovchinsky, G., and Uchihashi, S. 2000. An interactive comic book presentation for exploring video. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 185--192. Google Scholar
Digital Library
- Bouthemy, P., Gelgon, M., and Ganansia, F. 1999. A unified approach to shot change detection and camera motion characterization. IEEE Trans. Circ. Syst. Video Technol. 9, 7, 1030--1044. Google Scholar
Digital Library
- Cc Web Video. Near-duplicate web video dataset. http://vireo.cs.cityu.edu.hk/webvideo/.Google Scholar
- Covell, M., Withgott, M., and Slaney, M. 1998. Mach1: Nonuniform time-scale modification of speech. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing.Google Scholar
- Fernando, W. A. C., Canagarajah, C. N., and Bull, D. R. 1999. Automatic detection of fade-in and fade-out in video sequences. In Proceedings of the International Symposium on Circuits and Systems. Vol. 4. 255--258.Google Scholar
- Google. 2013. http://video.google.com/.Google Scholar
- H263. 2000. ITU-T Rec. H.263, Video coding for low bit rate communication. http://www.itu.int/rec/T-REC-H.263-200501-I.Google Scholar
- H264. 2003. ITU-T Rec. H.264—ISO/IEC 14496-10 avc, draft itu-t recommendation and final draft international standard of joint video specification. http://www.itu.int/ITU-T/recommendations/rec.aspx?rec=11466.Google Scholar
- Hampapur, A., Hyun, K., and Bolle, R. M. 2002. Comparison of sequence matching techniques for video copy detection. Proc. SPIE 4676, 194--201.Google Scholar
- Hauptmann, A. G., Christel, M. G., Lin, W.-H., Maher, B., Yang, J., Et Al. 2007. Clever clustering vs. simple speed-up for summarizing rushes. In Proceedings of the International Workshop on TRECVID Video Summarization. 20--24. Google Scholar
Digital Library
- Hsu, W. H., Kennedy, L. S., and Chang, S.-F. 2007. Reranking methods for visual search. IEEE Multimedia 14, 3, 14--22. Google Scholar
Digital Library
- Irani, M. and Anandan, P. 1998. Video indexing based on mosaic representations. Proc. IEEE 86, 5, 905--921.Google Scholar
Cross Ref
- Iso/Iec. 1991. Digital compression and coding of continuous still images, part 1: Requirements and guidelines. ISO/IEC JTC1 Draft International Standard 10918-1.Google Scholar
- Jiang, W., Cotton, C. V., Chang, S.-F., Ellis, D., and Loui, A. C. 2010a. Audio-visual atoms for generic video concept classification. ACM Trans. Multimedia Comput. Comm. Appl. 6, 3. Google Scholar
Digital Library
- Jiang, Y.-G., Yang, J., Ngo, C.-W., and Hauptmann, A. G. 2010b. Representations of keypoint-based semantic concept detection: A comprehensive study. IEEE Trans. Multimedia 12, 1, 42--53. Google Scholar
Digital Library
- Kim, C. and Hwang, J.-N. 2002. Object-based video abstraction for video surveillance systems. IEEE Trans. Circ. Syst. Video Technol. 12, 12, 1128--1138. Google Scholar
Digital Library
- Kim, J. G., Chang, H. S., Kim, J., and Kim, H. M. 2000. Efficient camera motion characterization for mpeg video indexing. In Proceedings of the IEEE International Conference on Multimedia and Expo. 1171--1174.Google Scholar
- Konrad, J. and Dufaux, F. 1998. Improved global motion estimation for n3. ISO/IEC JTC1/SC29/WG11 M3096.Google Scholar
- Lew, M. S., Sebe, N., Djeraba, C., and Jain, R. 2006. Content-based multimedia information retrieval: State of the art and challenges. ACM Trans. Multimedia Comput. Comm. Appl. 2, 1, 1--19. Google Scholar
Digital Library
- Li, Y., Jin, J., and Zhou, X. 2005. Video matching using binary signature. In Proceedings of the International Symposium on Intelligent Signal Processing and Communication Systems. 317--320.Google Scholar
- Liu, Y., Mei, T., and Hua, X.-S. 2009. CrowdReranking: Exploring multiple search engines for visual search reranking. In Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval. 500--507. Google Scholar
Digital Library
- Lowe, D. 2004. Distinctive image features from scale-invariant key points. Int. J. Comput. Vis. 60, 2, 91--110. Google Scholar
Digital Library
- Lu, L., Zhang, H.-J., and Li, S. Z. 2003. Content-based audio classification and segmentation by using support vector machines. Multimedia Syst. 8, 482--492.Google Scholar
Cross Ref
- Ma, Y.-F., Lu, L., Zhang, H.-J., and Li, M. 2002. A user attention model for video summarization. In Proceedings of the ACM International Conference on Multimedia. 533--542. Google Scholar
Digital Library
- Mei, T. and Hua, X.-S. 2008. Structure and event mining in sports video with efficient mosaic. Multimedia Tools Appl. 40, 1, 89--110. Google Scholar
Digital Library
- Mei, T., Hua, X.-S., Lai, W., Yang, L., Zha, Z., Et Al. 2007a. MSRA-USTC-SJTU at TRECVID 2007: High-level feature extraction and search. In TREC Video Retrieval Evaluation Online Proceedings.Google Scholar
- Mei, T., Hua, X.-S., and Li, S. 2009a. VideoSense: A contextual in-video advertising system. IEEE Trans. Circ. Syst. Video Technol. 19, 12, 1866--1879. Google Scholar
Digital Library
- Mei, T., Hua, X.-S., Zhu, C.-Z., Zhou, H.-Q., and Li, S. 2007b. Home video visual quality assessment with spatiotemporal factors. IEEE Trans. Circ. Syst. Video Technol. 17, 6, 699--706. Google Scholar
Digital Library
- Mei, T., Yang, B., Yang, S.-Q., and Hua, X.-S. 2009b. Video collage: Presenting a video sequence using a single image. Vis. Comput. 25, 1, 39--51. Google Scholar
Digital Library
- Moxley, E., Mei, T., and Manjunath, B. S. 2010. Video annotation through search and graph reinforcement mining. IEEE Trans. Multimedia 12, 3, 184--193. Google Scholar
Digital Library
- Mpeg-2. Mpeg-2 video group, information technology - generic coding of moving pictures and associated audio: Part 2—video. ISO/IEC 13818-2.Google Scholar
- Mpeg-4. Mpeg-4 video group, generic coding of audio-visual objects: Part 2—visual. ISO/IEC JTC1/SC29/WG11 N1902, FDIS of ISO/IEC 14 496-2.Google Scholar
- Naphade, M., Smith, J. R., Tesic, J., Chang, S.-F., Hsu, W., Kennedy, L., Hauptmann, A., and Curtis, J. 2006. Largescale concept ontology for multimedia. IEEE Multimedia 13, 3, 86--91. Google Scholar
Digital Library
- Nitta, N., Takahashi, Y., and Babaguchi, N. 2009. Automatic personalized video abstraction for sports videos using metadata. Multimedia Tools Appl. 41, 1, 1--25. Google Scholar
Digital Library
- Over, P., Smeaton, A. F., and Awad, G. 2008. The TRECVid 2008 BBC rushes summarization evaluation. In Proceedings of the ACM TRECVid Video Summarization Workshop. 1--20. Google Scholar
Digital Library
- Paisitkriangkrai, S., Mei, T., Zhang, J., and Hua, X.-S. 2010. Scalable clip-based near-duplicate video detection with ordinal measure. In Proceedings of the ACM International Conference on Image and Video Retrieval. Google Scholar
Digital Library
- Shao, X., Xu, C., Maddage, N. C., Tian, Q., Kankanhalli, M. S., and Jin, J. S. 2006. Automatic summarization of music videos. ACM Trans. Multimedia Comput. Comm. Appl. 2, 2, 127--148. Google Scholar
Digital Library
- Smeulders, A. W. M., Worring, M., Santini, S., Gupta, A., and Jain, R. 2000. Content-based image retrieval at the end of the early years. IEEE Trans. Pattern Anal. Mach. Intell. 22, 12, 1349--1380. Google Scholar
Digital Library
- Snoek, C. G. M., Worring, M., Van Gemert, J. C., Geusebroek, J.-M., and Smeulders, A. W. M. 2006. The challenge problem for automated detection of 101 semantic concepts in multimedia. In Proceedings of the ACM International Conference on Multimedia. 421--430. Google Scholar
Digital Library
- Tang, L.-X., Mei, T., and Hua, X.-S. 2009. Near-lossless video summarization. In Proceedings of the ACM International Conference on Multimedia. 351--360. Google Scholar
Digital Library
- Tjondronegoro, D., Chen, Y.-P. P., and Pham, B. 2003. Sports video summarization using highlights and play-breaks. In Proceedings of the ACM SIGMM International Workshop on Multimedia Information Retrieval. 201--208. Google Scholar
Digital Library
- Trecvid. 2013. http://www-nlpir.nist.gov/projects/trecvid/.Google Scholar
- Truong, B. T. and Venkatesh, S. 2007. Video abstraction: A systematic review and classification. ACM Trans.Multimedia Comput. Comm. Appl. 3, 1. Google Scholar
Digital Library
- Tse, T., Marchionini, G., Ding, W., Slaughter, L., and Komlodi, A. 1998. Dynamic key frame presentation techniques for augmenting video browsing. In Proceedings of the Working Conference on Advanced Visual Interfaces. 185--194. Google Scholar
Digital Library
- Wang, Y., Mei, T., and Hua, X.-S. 2011. Community discovery from movie and its application to poster generation. In Proceedings of the International Multimedia Modeling Conference. Google Scholar
Digital Library
- Wiegand, T., Sullivan, G. J., Bjontegaard, G., and Luthra, A. 2003. Overview of the H.264/AVC video coding standard. IEEE Trans. Circ. Syst. Video Technol. 13, 7, 560--576. Google Scholar
Digital Library
- Wu, X., Ngo, C.-W., Hauptmann, A. G., and Tan, H.-K. 2009. Real-time near-duplicate elimination for web video search with content and context. IEEE Trans. Multimedia 11, 2, 196--207. Google Scholar
Digital Library
- Zhang, H.-J., Kankanhalli, A., and Smoliar, S. W. 1993. Automatic partitioning of full-motion video. Multimedia Syst. 1, 1, 10--28. Google Scholar
Digital Library
- Zhao, W.-L. and Ngo, C.-W. 2009. Scale-rotation invariant pattern entropy for keypoint-based near- duplicate detection. IEEE Trans. Image Process. 18, 2, 412--423. Google Scholar
Digital Library
- Zhao, W.-L., Ngo, C.-W., Tan, H.-K., and Wu, X. 2007. Near-duplicate keyframe identification with interest point matching and pattern learning. IEEE Trans. Multimedia 9, 5, 1037--1048. Google Scholar
Digital Library
Index Terms
Near-lossless semantic video summarization and its applications to video analysis
Recommendations
Near-lossless video summarization
MM '09: Proceedings of the 17th ACM international conference on MultimediaThe daunting yet increasing volume of videos on the Internet brings the challenges of storage and indexing to existing online video services. Current techniques like video compression and summarization are still struggling to achieve the two often ...
A user attention model for video summarization
MULTIMEDIA '02: Proceedings of the tenth ACM international conference on MultimediaAutomatic generation of video summarization is one of the key techniques in video management and browsing. In this paper, we present a generic framework of video summarization based on the modeling of viewer's attention. Without fully semantic ...
Video summarization with semantic concept preservation
MUM '11: Proceedings of the 10th International Conference on Mobile and Ubiquitous MultimediaA compelling video summarization should allow viewers to understand the summary content and recover the original plot correctly. To this end, we materialize the abstract elements that are cognitively informative for viewers as concepts. They implicitly ...






Comments