skip to main content
research-article

Aesthetics-Guided Summarization from Multiple User Generated Videos

Published:07 January 2015Publication History
Skip Abstract Section

Abstract

In recent years, with the rapid development of camera technology and portable devices, we have witnessed a flourish of user generated videos, which are gradually reshaping the traditional professional video oriented media market. The volume of user generated videos in repositories is increasing at a rapid rate. In today's video retrieval systems, a simple query will return many videos which seriously increase the viewing burden. To manage these video retrievals and provide viewers with an efficient way to browse, we introduce a system to automatically generate a summarization from multiple user generated videos and present their salience to viewers in an enjoyable manner. Among multiple consumer videos, we find their qualities to be highly diverse due to various factors such as a photographer's experience or environmental conditions at the time of capture. Such quality inspires us to include a video quality evaluation component into the video summarization since videos with poor qualities can seriously degrade the viewing experience. We first propose a probabilistic model to evaluate the aesthetic quality of each user generated video. This model compares the rich aesthetics information from several well-known photo databases with generic unlabeled consumer videos, under a human perception component indicating the correlation between a video and its constituting frames. Subjective studies were carried out with the results indicating that our method is reliable. Then a novel graph-based formulation is proposed for the multi-video summarization task. Desirable summarization criteria is incorporated as the graph attributes and the problem is solved through a dynamic programming framework. Comparisons with several state-of-the-art methods demonstrate that our algorithm performs better than other methods in generating a skimming video in preserving the essential scenes from the original multiple input videos, with smooth transitions among consecutive segments and appealing aesthetics overall.

References

  1. Subhabrata Bhattacharya, Rahul Sukthankar, and Mubarak Shah. 2010. A framework for photo-quality assessment and enhancement based on visual aesthetics. In Proceedings of the ACM International Conference on Multimedia. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Yue-Meng Chen and I. V. Bajic. 2011. A joint approach to global motion estimation and motion segmentation from a coarsely sampled motion vector field. IEEE Trans. Circuits Syst. Video Technol. 21, 9, 1316--1328.Google ScholarGoogle ScholarCross RefCross Ref
  3. Yang Cong, Junsong Yuan, and Jiebo Luo. 2012. Towards scalable summarization of consumer videos via sparse dictionary selection. IEEE Trans. Multimedia 14, 1, 66--75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Navneet Dalal and Bill Triggs. 2005. Histograms of oriented gradients for human detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Ritendra Datta, Dhiraj Joshi, Jia Li, and James Z. Wang. 2006. Studying aesthetics in photographic images using a computational approach. In Proceedings of the European Conference on Computer Vision. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Ritendra Datta, Jia Li, and James Z. Wang. 2008. Algorithmic inferencing of aesthetics and emotion in natural images: An exposition. In Proceedings of the IEEE International Conference on Image Processing.Google ScholarGoogle Scholar
  7. S. Dhar, V. Ordonez, and T. L. Berg. 2011. High level describable attributes for predicting aesthetics and interestingness. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Ekin, A. M. Tekalp, and R. Mehrotra. 2003. Automatic soccer video analysis and summarization. IEEE Trans. Image Process. 12, 7, 796--807. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. GeoVid 2013. GeoVid -- Georeferenced Video Portal. http://geovid.org/. (2013).Google ScholarGoogle Scholar
  10. Jia Hao, GuanfengWang, Beomjoo Seo, and Roger Zimmermann. 2011. Keyframe presentation for browsing of user-generated videos on map interfaces. In Proceedings of the ACM International Conference on Multimedia. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Jonathan Harel, Christof Koch, and Pietro Perona. 2007. Graph-based visual saliency. In Advances in Neural Information Processing Systems.Google ScholarGoogle Scholar
  12. Liwei He, Elizabeth Sanocki, Anoop Gupta, and Jonathan Grudin. 1999. Auto-summarization of audio-video presentations. In Proceedings of the ACM International Conference on Multimedia. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Tetsuro Hori and Kiyoharu Aizawa. 2003. Context-based video retrieval system for the life-log applications. In Proceedings of the ACM SIGMM International Workshop on Multimedia Information Retrieval. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Khosla, R. Hamid, Chih-Jen Lin, and N. Sundaresan. 2013. Large-scale video summarization using web-image priors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. JaeGon Kim, Hyun Sung Chang, Kyeongok Kang, Munchurl Kim, Jinwoong Kim, and HyungMyung Kim. 2003. Summarization of news video and its description for content-based access. Int. J. Imaging Syst. Techno. 13, 5, 267--274.Google ScholarGoogle ScholarCross RefCross Ref
  16. Jae-Gil Lee, Jiawei Han, and Kyu-Young Whang. 2007. Trajectory clustering: A partition-and-group framework. In Proceedings of the ACM SIGMOD International Conference on Management of Data. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Michael S. Lew, Nicu Sebe, Chabane Djeraba, and Ramesh Jain. 2006. Content-based multimedia information retrieval: State of the art and challenges. ACM Trans. Multimedia Comput. Commun. Appl. 2, 1, 1--19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Yingbo Li and B. Merialdo. 2010a. Multi-video summarization based on AV-MMR. In Proceedings of the International Workshop on Content-Based Multimedia Indexing.Google ScholarGoogle Scholar
  19. Yingbo Li and B. Merialdo. 2010b. Multi-video summarization based on Video-MMR. In Proceedings of the International Workshop on Image Analysis for Multimedia Interactive Services.Google ScholarGoogle Scholar
  20. Yingbo Li and B. Merialdo. 2011. Multi-video summarization based on OB-MMR. In Proceedings of the International Workshop on Content-Based Multimedia Indexing.Google ScholarGoogle Scholar
  21. Yiwen Luo and Xiaoou Tang. 2008. Photo and video quality evaluation: focusing on the subject. In Proceedings of the European Conference on Computer Vision. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. L. Marchesotti, F. Perronnin, D. Larlus, and G. Csurka. 2011. Assessing the aesthetic quality of photographs using generic image descriptors. In Proceedings of the IEEE International Conference on Computer Vision. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Arthur G. Money and Harry Agius. 2008. Video Summarisation: A conceptual framework and survey of the state of the art. J. Visual Commun. Image Represent. 19, 121--143. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Anush K. Moorthy, Pere Obrador, and Nuria Oliver. 2010. Towards computational models of the visual aesthetic appeal of consumer videos. In Proceedings of the European Conference on Computer Vision. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. I. Otsuka, K. Nakane, A. Divakaran, K. Hatanaka, and M. Ogawa. 2005. A highlight scene detection and video summarization system using audio feature for a personal video recorder. IEEE Trans. Consum. Electron. 51, 1, 112--116. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Mukesh Kumar Saini, Raghudeep Gadde, Shuicheng Yan, and Wei Tsang Ooi. 2012. MoViMash: Online mobile video mashup. In Proceedings of the ACM International Conference on Multimedia. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. J. Shao, D. Jiang, M. Wang, H. Chen, and L. Yao. 2010. Multi-video Summarization Using Complex Graph Clustering and Mining. Comput. Sci. Inf. Syst. 7, 1, 85--98.Google ScholarGoogle ScholarCross RefCross Ref
  28. Xi Shao, Changsheng Xu, Namunu C. Maddage, Qi Tian, Mohan S. Kankanhalli, and Jesse S. Jin. 2006. Automatic summarization of music videos. ACM Trans. Multimedia Comput. Commun. Appl. 2, 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. F. Shipman, Andreas Girgensohn, and Lynn Wilcox. 2003. Creating navigable multi-level video summaries. In Proceedings of the IEEE International Conference on Multimedia and Expo. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Hsiao-Hang Su, Tse-Wei Chen, Chieh-Chi Kao, Winston H. Hsu, and Shao-Yi Chien. 2011. Scenic photo quality assessment with bag of aesthetics-preserving features. In Proceedings of the ACM International Conference on Multimedia. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Xiaoshuai Sun, Hongxun Yao, Rongrong Ji, and Shaohui Liu. 2009. Photo assessment based on computational visual attention model. In Proceedings of the ACM International Conference on Multimedia. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Ba Tu Truong and Svetha Venkatesh. 2007. Video abstraction: A systematic review and classification. ACM Trans. Multimedia Comput. Commun. Appl. 3, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Feng Wang and Bernard Merialdo. 2009. Multi-document video summarization. In Proceedings of the IEEE International Conference on Multimedia and Expo. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Xiaoyu Wang, T. X. Han, and Shuicheng Yan. 2009. An HOG-LBP human detector with partial occlusion handling. In Proceedings of the IEEE International Conference on Computer Vision.Google ScholarGoogle ScholarCross RefCross Ref
  35. Yanran Wang. 2013. Beauty is here: Evaluating aesthetics in videos using multimodal features and free training data. In Proceedings of the ACM International Conference on Multimedia. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Zhou Wang and Qiang Li. 2007. Video quality assessment using a statistical model of human visual speed perception. J. Opt. Soc. Am. A 24, 12, B61--B69.Google ScholarGoogle ScholarCross RefCross Ref
  37. Zhou Wang, Hamid R. Sheikh, and Alan C. Bovik. 2003. Objective video quality assessment. In The Handbook of Video Databases: Design and Applications, 1041--1078.Google ScholarGoogle Scholar
  38. Stefan Wilk and Wolfgang Effelsberg. 2013. Crowd-sourced evaluation of the perceived viewing quality in user-generated video. In Proceedings of the ACM International Workshop on Crowdsourcing for Multimedia. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Changsheng Xu, Jinjun Wang, Hanqing Lu, and Yifan Zhang. 2008. A novel framework for semantic annotation and personalized retrieval of sports video. IEEE Trans. Multimedia 10, 3, 421--436. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Changsheng Xu, Jinjun Wang, Kongwah Wan, Yiqun Li, and Lingyu Duan. 2006. Live sports event detection based on broadcast video and web-casting text. In Proceedings of the ACM International Conference on Multimedia. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Jianzhou Yan, S. Lin, Sing Bing Kang, and Xiaoou Tang. 2013. Learning the change for automatic image cropping. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Chun-Yu Yang, Hsin-Ho Yeh, and Chu-Song Chen. 2011. Video aesthetic quality assessment by combining semantically independent and dependent features. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing.Google ScholarGoogle ScholarCross RefCross Ref
  43. Yang Yang, Yi Yang, and Heng Tao Shen. 2013. Effective transfer tagging from image to video. ACM Trans. Multimedia Comput. Commun. Appl. 9, 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Luming Zhang, Yue Gao, R. Zimmermann, Qi Tian, and Xuelong Li. 2014. Fusion of multichannel local and global structural cues for photo aesthetics evaluation. IEEE Trans. Image Process. 23, 3, 1419--1429. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Luming Zhang, Yue Gao, Rongrong Ji, Qionghai Dai, and Xuelong Li. 2013c. Actively learning human gaze shifting paths for photo cropping. IEEE Trans. Image Process. 23, 5.Google ScholarGoogle Scholar
  46. Luming Zhang, Mingli Song, Qi Zhao, Xiao Liu, Jiajun Bu, and Chun Chen. 2013b. Probabilistic graphlet transfer for photo cropping. IEEE Trans. Image Process. 21, 5, 2887C2897. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Ying Zhang, He Ma, and Roger Zimmermann. 2013a. Dynamic multi-video summarization of sensor-rich videos in geo-space. Adv. Multimedia Modeling 7732, 380--390.Google ScholarGoogle ScholarCross RefCross Ref
  48. Ying Zhang, Guanfeng Wang, Beomjoo Seo, and Roger Zimmermann. 2012. Multi-video summary and skim generation of sensor-rich videos in geo-space. In Proceedings of the ACM Multimedia Systems Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Ying Zhang and Roger Zimmermann. 2012. DVS: A dynamic multi-video summarization system of sensor-rich videos in geo-space. In Proceedings of the ACM International Conference on Multimedia. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Aesthetics-Guided Summarization from Multiple User Generated Videos

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Multimedia Computing, Communications, and Applications
        ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 11, Issue 2
        December 2014
        197 pages
        ISSN:1551-6857
        EISSN:1551-6865
        DOI:10.1145/2716635
        Issue’s Table of Contents

        Copyright © 2015 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 7 January 2015
        • Revised: 1 August 2014
        • Accepted: 1 August 2014
        • Received: 1 February 2014
        Published in tomm Volume 11, Issue 2

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!