skip to main content
research-article

A Top-Down Approach for Video Summarization

Published:04 September 2014Publication History
Skip Abstract Section

Abstract

While most existing video summarization approaches aim to identify important frames of a video from either a global or local perspective, we propose a top-down approach consisting of scene identification and scene summarization. For scene identification, we represent each frame with global features and utilize a scalable clustering method. We then formulate scene summarization as choosing those frames that best cover a set of local descriptors with minimal redundancy. In addition, we develop a visual word-based approach to make our approach more computationally scalable. Experimental results on two benchmark datasets demonstrate that our proposed approach clearly outperforms the state-of-the-art.

References

  1. R. Achantay, S. Hemamiz, F. Estraday, and S. Susstrunky. 2009. Frequency-tuned salient region detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle Scholar
  2. D. Besiris, A. Makedonas, G. Economou, and S. Fotopoulos. 2009. Combining graph connectivity and dominant set clustering for video summarization. Multimedia Tools Appl. 44, 161--186. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Bian, Y. Yang, and T.-S. Chua. 2013. Multimedia summarization for trending topics in microblogs. In Proceedings of the ACM International Conference on Conference on Information and Knowledge Management (CIKM'13). 1807--1812. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. L. Cao, Y. Mu, A. Natsev, S.-F. Chang, G. Hua, and J. R. Smith. 2012. Scene aligned pooling for complex video recognition. In Proceedings of the European Conference on Computer Vision (ECCV'12). Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. G. Carbonell and J. Goldstein. 1998. The use of mmr, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'98). 335--336. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. A. Chatzichristofis and Y. S. Boutalis. 2008. CEDD: Color and edge directivity descriptor: A compact descriptor for image indexing and retrieval. In Proceedings of the International Conference on Computer Vision Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. B.-W. Chen, J.-C. Wang, and J.-F. Wang. 2009. A novel video summarization based on mining the story-structure and semantic relations among concept entities. IEEE Trans. Multimedia 11, 295--312. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. F. Chen, C. D. Vleeschouwer, and A. Cavallaro. 2014. Resource allocation for personalized video summarization. IEEE Trans. Multimedia 16, 2, 455--469. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Y. Cong, J. Yuan, and J. Luo. 2012. Towards scalable summarization of consumer videos via sparse dictionary selection. IEEE Trans. Multimedia 14, 1, 66--75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. E. F. Devila, A. P. B. Lopes, A. Da Luz Jr, and A. De Lbuquerque Arajo. 2011. VSUMM: A mechanism designed to produce static video summaries and a novel evaluation method. Pattern Recogn. Lett. 32, 56--68. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. F. Dementhon, V. Kobla, and D. Doermann. 1998. Video summarization by curve simplification. In Proceedings of the ACM International Conference on Multimedia. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. G. Evangelopoulos, K. Rapantzikos, A. Potamianos, P. Maragos, A. Zlatintsi, and Y. Avrithis. 2008. Movie summarization based on audio-visual saliency detection. In Proceedings of the IEEE International Conference on Image Processing.Google ScholarGoogle Scholar
  13. G. Evangelopoulos, A. Zlatintsi, A. Potamianos, P. Maragos, K. Rapantzikos, G. Skoumas, and Y. Avrithis. 2013. Multimodal saliency and fusion for movie summarization based on aural, visual, and textual attention. IEEE Trans. Multimedia 15, 7, 1553--1568. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. B. J. Frey and D. Dueck. 2007. Clustering by passing messages between data points. Science 315, 972--976.Google ScholarGoogle ScholarCross RefCross Ref
  15. M. Furini, F. Geraci, M. Montangero, and M. Pellegrini. 2010. STIMO: Still and moving video storyboard for the web scenario. Multimedia Tools Appl. 46, 47--69. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Y. Gong and X. Liu. 2000. Video summarization using singular value decomposition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle Scholar
  17. G. Guan, Z. Wang, J. D. Deng, and D. D. Feng. 2013. Keypoint based keyframe selection. IEEE Trans. Circ. Syst. Video Technol. 23, 4, 729--734. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. G. Guan, Z. Wang, K. Yu, S. Mei, M. He, and D. Feng. 2012. Video summarization with global and local features. In Proceedings of the IEEE International Conference on Multimedia and Expo Workshops. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. R. Hong, J. Tang, H.-K. Tan, C.-W. Ngo, S. Yan, and T.-S. Chua. 2011. Beyond search: Event-driven summarization for web videos. ACM Trans. Multimedia Comput. Comm. Appl. 7, 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Li, Y. Ding, Y. Shi, and W. Li. 2010. A divide-and-rule scheme for shot boundary detection based on sift. Int. J. Digital Content Technol. Appl. 4, 202--214.Google ScholarGoogle ScholarCross RefCross Ref
  21. Y. Li, B. Merialdo, M. Rouvier, and G. Linares. 2011. Static and dynamic video summaries. In Proceedings of the ACM International Conference on Multimedia (MM'11). 1573--1576. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Z. Li, G. M. Schuster, and A. K. Katsaggelos. 2005. MINMAX optimal video summarization. IEEE Trans. Circ. Syst. Video Technol. 15, 1245--1256. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. R. Lienhart, S. Pfeiffer, and W. Effelsberg. 1997. Video abstracting. Comm. ACM 40, 12, 54--62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. G. Liu, X. Wen, W. Zheng, and P. He. 2009. Shot boundary detection and keyframe extraction based on scale invariant feature transform. In Proceedings of the IEEE/ACIS International Conference on Computer and Information Science. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. D. G. Lowe. 2004. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 91--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. Lu, Z. Wang, T. Mei, G. Guan, and D. D. Feng. 2014. A bag-of-importance model with locality-constrained coding based feature learning for video summarization. IEEE Trans. Multimedia (to appear).Google ScholarGoogle ScholarCross RefCross Ref
  27. S. Lu, Z. Wang, Y. Song, T. Mei, and D. D. Feng. 2013. A bag-of-importance model for video summarization. In Proceedings of the ICME Workshop on Emerging Multimedia Systems and Applications (EMSA'13).Google ScholarGoogle Scholar
  28. Z. Lu and K. Grauman. 2013. Story-driven summarization for egocentric video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J. Luo, C. Papin, and K. Costello. 2009. Towards extracting semantically meaningful key frames from personal video clips: From humans to computers. IEEE Trans. Circ. Syst. Video Technol. 19, 289--301. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. U. Luxburg. 2007. A tutorial on spectral clustering. J. Statist. Comput. 17, 4, 395--416. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Y.-F. Ma, X.-S. Hua, L. Lu, and H.-J. Zhang. 2005. A generic framework of user attention model and its application in video summarization. IEEE Trans. Multimedia 7, 907--919. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. S. Mei, G. Guan, Z. Wang, M. He, X.-S. Hua, and D. D. Feng. 2014. l2,0 constrained sparse dictionary selection for video summarization. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME'14).Google ScholarGoogle Scholar
  33. T. Mei, L.-X. Tang, J. Tang, and X.-S. Hua. 2013. Near-lossless semantic video summarization and its applications to video analysis. ACM Trans. Multimedia Comput. Comm. Appl. 9, 3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. K. Mikolajczyk and C. Schmid. 2005. A performance evaluation of local descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 10, 27, 1615--1630. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. A. Money and H. Agius. 2008. Video summarisation: A conceptual framework and survey of the state of the art. J. Vis. Comm. Image Represent. 19, 121--143. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. M. Muja and D. G. Lowe. 2009. Fast approximate nearest neighbors with automatic algorithm configuration. In Proceedings of the International Conference on Computer Vision Theory and Applications.Google ScholarGoogle Scholar
  37. P. Mundur, Y. Rao, and Y. Yesha. 2006. Keyframe-based video summarization using delaunay clustering. Int. J. Digital Librar. 6, 2, 219--232. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. C.-W. Ngo, Y.-F. Ma, and H.-J. Zhang. 2005. Video summarization and scene detection by graph modeling. IEEE Trans. Circ. Syst. Video Technol. 15, 296--305. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. C. Panagiotakis, A. Doulamis, and G. Tziritas. 2009. Equivalent key frames selection based on iso-content principles. IEEE Trans. Circ. Syst. Video Technol. 19, 447--451. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. D. Pelleg and A. W. Moore. 2000. X-means: Extending k-means with efficient estimation of the number of clusters. In Proceedings of the 17th International Conference on Machine Learning. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. B. T. Truong and S. Venkatesh. 2007. Video abstraction: A systematic review and classification. ACM Trans. Multimedia Comput. Comm. Appl. 3, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. M. Wang, R. Hong, G. Li, Z.-J. Zha, S. Yan, and T.-S. Chua. 2012. Event driven web video summarization by tag localization and key-shot identification. IEEE Trans. Multimedia 14, 4, 975--985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. YouTube Statistics. 2012. http://www.youtube.com/yt/press/statistics.html.Google ScholarGoogle Scholar
  44. Y.-T. Zheng, S.-Y. Neo, T.-S. Chua, and Q. Tian. 2007. The use of temporal, semantic and visual partitioning model for efficient near duplicate keyframe detection in large scale news corpus. In Proceedings of the ACM International Conference on Image and Video Retrieval. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Y. Zhuang, Y. Rui, T. Huang, and S. Mehrotraw. 1998. Adaptive key frame extraction using unsupervised clustering. In Proceedings of the IEEE International Conference on Image Processing.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image ACM Transactions on Multimedia Computing, Communications, and Applications
    ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 11, Issue 1
    August 2014
    151 pages
    ISSN:1551-6857
    EISSN:1551-6865
    DOI:10.1145/2665935
    Issue’s Table of Contents

    Copyright © 2014 ACM

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 4 September 2014
    • Accepted: 1 April 2014
    • Revised: 1 January 2014
    • Received: 1 October 2013
    Published in tomm Volume 11, Issue 1

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader
About Cookies On This Site

We use cookies to ensure that we give you the best experience on our website.

Learn more

Got it!