Abstract
While most existing video summarization approaches aim to identify important frames of a video from either a global or local perspective, we propose a top-down approach consisting of scene identification and scene summarization. For scene identification, we represent each frame with global features and utilize a scalable clustering method. We then formulate scene summarization as choosing those frames that best cover a set of local descriptors with minimal redundancy. In addition, we develop a visual word-based approach to make our approach more computationally scalable. Experimental results on two benchmark datasets demonstrate that our proposed approach clearly outperforms the state-of-the-art.
- R. Achantay, S. Hemamiz, F. Estraday, and S. Susstrunky. 2009. Frequency-tuned salient region detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
- D. Besiris, A. Makedonas, G. Economou, and S. Fotopoulos. 2009. Combining graph connectivity and dominant set clustering for video summarization. Multimedia Tools Appl. 44, 161--186. Google Scholar
Digital Library
- J. Bian, Y. Yang, and T.-S. Chua. 2013. Multimedia summarization for trending topics in microblogs. In Proceedings of the ACM International Conference on Conference on Information and Knowledge Management (CIKM'13). 1807--1812. Google Scholar
Digital Library
- L. Cao, Y. Mu, A. Natsev, S.-F. Chang, G. Hua, and J. R. Smith. 2012. Scene aligned pooling for complex video recognition. In Proceedings of the European Conference on Computer Vision (ECCV'12). Google Scholar
Digital Library
- J. G. Carbonell and J. Goldstein. 1998. The use of mmr, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'98). 335--336. Google Scholar
Digital Library
- S. A. Chatzichristofis and Y. S. Boutalis. 2008. CEDD: Color and edge directivity descriptor: A compact descriptor for image indexing and retrieval. In Proceedings of the International Conference on Computer Vision Systems. Google Scholar
Digital Library
- B.-W. Chen, J.-C. Wang, and J.-F. Wang. 2009. A novel video summarization based on mining the story-structure and semantic relations among concept entities. IEEE Trans. Multimedia 11, 295--312. Google Scholar
Digital Library
- F. Chen, C. D. Vleeschouwer, and A. Cavallaro. 2014. Resource allocation for personalized video summarization. IEEE Trans. Multimedia 16, 2, 455--469. Google Scholar
Digital Library
- Y. Cong, J. Yuan, and J. Luo. 2012. Towards scalable summarization of consumer videos via sparse dictionary selection. IEEE Trans. Multimedia 14, 1, 66--75. Google Scholar
Digital Library
- S. E. F. Devila, A. P. B. Lopes, A. Da Luz Jr, and A. De Lbuquerque Arajo. 2011. VSUMM: A mechanism designed to produce static video summaries and a novel evaluation method. Pattern Recogn. Lett. 32, 56--68. Google Scholar
Digital Library
- D. F. Dementhon, V. Kobla, and D. Doermann. 1998. Video summarization by curve simplification. In Proceedings of the ACM International Conference on Multimedia. Google Scholar
Digital Library
- G. Evangelopoulos, K. Rapantzikos, A. Potamianos, P. Maragos, A. Zlatintsi, and Y. Avrithis. 2008. Movie summarization based on audio-visual saliency detection. In Proceedings of the IEEE International Conference on Image Processing.Google Scholar
- G. Evangelopoulos, A. Zlatintsi, A. Potamianos, P. Maragos, K. Rapantzikos, G. Skoumas, and Y. Avrithis. 2013. Multimodal saliency and fusion for movie summarization based on aural, visual, and textual attention. IEEE Trans. Multimedia 15, 7, 1553--1568. Google Scholar
Digital Library
- B. J. Frey and D. Dueck. 2007. Clustering by passing messages between data points. Science 315, 972--976.Google Scholar
Cross Ref
- M. Furini, F. Geraci, M. Montangero, and M. Pellegrini. 2010. STIMO: Still and moving video storyboard for the web scenario. Multimedia Tools Appl. 46, 47--69. Google Scholar
Digital Library
- Y. Gong and X. Liu. 2000. Video summarization using singular value decomposition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
- G. Guan, Z. Wang, J. D. Deng, and D. D. Feng. 2013. Keypoint based keyframe selection. IEEE Trans. Circ. Syst. Video Technol. 23, 4, 729--734. Google Scholar
Digital Library
- G. Guan, Z. Wang, K. Yu, S. Mei, M. He, and D. Feng. 2012. Video summarization with global and local features. In Proceedings of the IEEE International Conference on Multimedia and Expo Workshops. Google Scholar
Digital Library
- R. Hong, J. Tang, H.-K. Tan, C.-W. Ngo, S. Yan, and T.-S. Chua. 2011. Beyond search: Event-driven summarization for web videos. ACM Trans. Multimedia Comput. Comm. Appl. 7, 4. Google Scholar
Digital Library
- J. Li, Y. Ding, Y. Shi, and W. Li. 2010. A divide-and-rule scheme for shot boundary detection based on sift. Int. J. Digital Content Technol. Appl. 4, 202--214.Google Scholar
Cross Ref
- Y. Li, B. Merialdo, M. Rouvier, and G. Linares. 2011. Static and dynamic video summaries. In Proceedings of the ACM International Conference on Multimedia (MM'11). 1573--1576. Google Scholar
Digital Library
- Z. Li, G. M. Schuster, and A. K. Katsaggelos. 2005. MINMAX optimal video summarization. IEEE Trans. Circ. Syst. Video Technol. 15, 1245--1256. Google Scholar
Digital Library
- R. Lienhart, S. Pfeiffer, and W. Effelsberg. 1997. Video abstracting. Comm. ACM 40, 12, 54--62. Google Scholar
Digital Library
- G. Liu, X. Wen, W. Zheng, and P. He. 2009. Shot boundary detection and keyframe extraction based on scale invariant feature transform. In Proceedings of the IEEE/ACIS International Conference on Computer and Information Science. Google Scholar
Digital Library
- D. G. Lowe. 2004. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 91--110. Google Scholar
Digital Library
- S. Lu, Z. Wang, T. Mei, G. Guan, and D. D. Feng. 2014. A bag-of-importance model with locality-constrained coding based feature learning for video summarization. IEEE Trans. Multimedia (to appear).Google Scholar
Cross Ref
- S. Lu, Z. Wang, Y. Song, T. Mei, and D. D. Feng. 2013. A bag-of-importance model for video summarization. In Proceedings of the ICME Workshop on Emerging Multimedia Systems and Applications (EMSA'13).Google Scholar
- Z. Lu and K. Grauman. 2013. Story-driven summarization for egocentric video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'13). Google Scholar
Digital Library
- J. Luo, C. Papin, and K. Costello. 2009. Towards extracting semantically meaningful key frames from personal video clips: From humans to computers. IEEE Trans. Circ. Syst. Video Technol. 19, 289--301. Google Scholar
Digital Library
- U. Luxburg. 2007. A tutorial on spectral clustering. J. Statist. Comput. 17, 4, 395--416. Google Scholar
Digital Library
- Y.-F. Ma, X.-S. Hua, L. Lu, and H.-J. Zhang. 2005. A generic framework of user attention model and its application in video summarization. IEEE Trans. Multimedia 7, 907--919. Google Scholar
Digital Library
- S. Mei, G. Guan, Z. Wang, M. He, X.-S. Hua, and D. D. Feng. 2014. l2,0 constrained sparse dictionary selection for video summarization. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME'14).Google Scholar
- T. Mei, L.-X. Tang, J. Tang, and X.-S. Hua. 2013. Near-lossless semantic video summarization and its applications to video analysis. ACM Trans. Multimedia Comput. Comm. Appl. 9, 3. Google Scholar
Digital Library
- K. Mikolajczyk and C. Schmid. 2005. A performance evaluation of local descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 10, 27, 1615--1630. Google Scholar
Digital Library
- A. Money and H. Agius. 2008. Video summarisation: A conceptual framework and survey of the state of the art. J. Vis. Comm. Image Represent. 19, 121--143. Google Scholar
Digital Library
- M. Muja and D. G. Lowe. 2009. Fast approximate nearest neighbors with automatic algorithm configuration. In Proceedings of the International Conference on Computer Vision Theory and Applications.Google Scholar
- P. Mundur, Y. Rao, and Y. Yesha. 2006. Keyframe-based video summarization using delaunay clustering. Int. J. Digital Librar. 6, 2, 219--232. Google Scholar
Digital Library
- C.-W. Ngo, Y.-F. Ma, and H.-J. Zhang. 2005. Video summarization and scene detection by graph modeling. IEEE Trans. Circ. Syst. Video Technol. 15, 296--305. Google Scholar
Digital Library
- C. Panagiotakis, A. Doulamis, and G. Tziritas. 2009. Equivalent key frames selection based on iso-content principles. IEEE Trans. Circ. Syst. Video Technol. 19, 447--451. Google Scholar
Digital Library
- D. Pelleg and A. W. Moore. 2000. X-means: Extending k-means with efficient estimation of the number of clusters. In Proceedings of the 17th International Conference on Machine Learning. Google Scholar
Digital Library
- B. T. Truong and S. Venkatesh. 2007. Video abstraction: A systematic review and classification. ACM Trans. Multimedia Comput. Comm. Appl. 3, 1. Google Scholar
Digital Library
- M. Wang, R. Hong, G. Li, Z.-J. Zha, S. Yan, and T.-S. Chua. 2012. Event driven web video summarization by tag localization and key-shot identification. IEEE Trans. Multimedia 14, 4, 975--985. Google Scholar
Digital Library
- YouTube Statistics. 2012. http://www.youtube.com/yt/press/statistics.html.Google Scholar
- Y.-T. Zheng, S.-Y. Neo, T.-S. Chua, and Q. Tian. 2007. The use of temporal, semantic and visual partitioning model for efficient near duplicate keyframe detection in large scale news corpus. In Proceedings of the ACM International Conference on Image and Video Retrieval. Google Scholar
Digital Library
- Y. Zhuang, Y. Rui, T. Huang, and S. Mehrotraw. 1998. Adaptive key frame extraction using unsupervised clustering. In Proceedings of the IEEE International Conference on Image Processing.Google Scholar
Recommendations
Hierarchical Recurrent Neural Network for Video Summarization
MM '17: Proceedings of the 25th ACM international conference on MultimediaExploiting the temporal dependency among video frames or subshots is very important for the task of video summarization. Practically, RNN is good at temporal dependency modeling, and has achieved overwhelming performance in many video-based tasks, such ...
Video Summarization with Global and Local Features
ICMEW '12: Proceedings of the 2012 IEEE International Conference on Multimedia and Expo WorkshopsVideo summarization has been crucial for effective and efficient access of video content due to the ever increasing amount of video data. Most of the existing key frame based summarization approaches represent individual frames with global features, ...
Multi-video summarization with query-dependent weighted archetypal analysis
AbstractGiven the tremendous growth of web videos, video summarization is becoming increasingly important to improve user’s browsing experience. Since most existing methods focus on generating an informative summarization from a single video ...






Comments