Abstract
In recent years, with the rapid development of camera technology and portable devices, we have witnessed a flourish of user generated videos, which are gradually reshaping the traditional professional video oriented media market. The volume of user generated videos in repositories is increasing at a rapid rate. In today's video retrieval systems, a simple query will return many videos which seriously increase the viewing burden. To manage these video retrievals and provide viewers with an efficient way to browse, we introduce a system to automatically generate a summarization from multiple user generated videos and present their salience to viewers in an enjoyable manner. Among multiple consumer videos, we find their qualities to be highly diverse due to various factors such as a photographer's experience or environmental conditions at the time of capture. Such quality inspires us to include a video quality evaluation component into the video summarization since videos with poor qualities can seriously degrade the viewing experience. We first propose a probabilistic model to evaluate the aesthetic quality of each user generated video. This model compares the rich aesthetics information from several well-known photo databases with generic unlabeled consumer videos, under a human perception component indicating the correlation between a video and its constituting frames. Subjective studies were carried out with the results indicating that our method is reliable. Then a novel graph-based formulation is proposed for the multi-video summarization task. Desirable summarization criteria is incorporated as the graph attributes and the problem is solved through a dynamic programming framework. Comparisons with several state-of-the-art methods demonstrate that our algorithm performs better than other methods in generating a skimming video in preserving the essential scenes from the original multiple input videos, with smooth transitions among consecutive segments and appealing aesthetics overall.
- Subhabrata Bhattacharya, Rahul Sukthankar, and Mubarak Shah. 2010. A framework for photo-quality assessment and enhancement based on visual aesthetics. In Proceedings of the ACM International Conference on Multimedia. Google Scholar
Digital Library
- Yue-Meng Chen and I. V. Bajic. 2011. A joint approach to global motion estimation and motion segmentation from a coarsely sampled motion vector field. IEEE Trans. Circuits Syst. Video Technol. 21, 9, 1316--1328.Google Scholar
Cross Ref
- Yang Cong, Junsong Yuan, and Jiebo Luo. 2012. Towards scalable summarization of consumer videos via sparse dictionary selection. IEEE Trans. Multimedia 14, 1, 66--75. Google Scholar
Digital Library
- Navneet Dalal and Bill Triggs. 2005. Histograms of oriented gradients for human detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Google Scholar
Digital Library
- Ritendra Datta, Dhiraj Joshi, Jia Li, and James Z. Wang. 2006. Studying aesthetics in photographic images using a computational approach. In Proceedings of the European Conference on Computer Vision. Google Scholar
Digital Library
- Ritendra Datta, Jia Li, and James Z. Wang. 2008. Algorithmic inferencing of aesthetics and emotion in natural images: An exposition. In Proceedings of the IEEE International Conference on Image Processing.Google Scholar
- S. Dhar, V. Ordonez, and T. L. Berg. 2011. High level describable attributes for predicting aesthetics and interestingness. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Google Scholar
Digital Library
- A. Ekin, A. M. Tekalp, and R. Mehrotra. 2003. Automatic soccer video analysis and summarization. IEEE Trans. Image Process. 12, 7, 796--807. Google Scholar
Digital Library
- GeoVid 2013. GeoVid -- Georeferenced Video Portal. http://geovid.org/. (2013).Google Scholar
- Jia Hao, GuanfengWang, Beomjoo Seo, and Roger Zimmermann. 2011. Keyframe presentation for browsing of user-generated videos on map interfaces. In Proceedings of the ACM International Conference on Multimedia. Google Scholar
Digital Library
- Jonathan Harel, Christof Koch, and Pietro Perona. 2007. Graph-based visual saliency. In Advances in Neural Information Processing Systems.Google Scholar
- Liwei He, Elizabeth Sanocki, Anoop Gupta, and Jonathan Grudin. 1999. Auto-summarization of audio-video presentations. In Proceedings of the ACM International Conference on Multimedia. Google Scholar
Digital Library
- Tetsuro Hori and Kiyoharu Aizawa. 2003. Context-based video retrieval system for the life-log applications. In Proceedings of the ACM SIGMM International Workshop on Multimedia Information Retrieval. Google Scholar
Digital Library
- A. Khosla, R. Hamid, Chih-Jen Lin, and N. Sundaresan. 2013. Large-scale video summarization using web-image priors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Google Scholar
Digital Library
- JaeGon Kim, Hyun Sung Chang, Kyeongok Kang, Munchurl Kim, Jinwoong Kim, and HyungMyung Kim. 2003. Summarization of news video and its description for content-based access. Int. J. Imaging Syst. Techno. 13, 5, 267--274.Google Scholar
Cross Ref
- Jae-Gil Lee, Jiawei Han, and Kyu-Young Whang. 2007. Trajectory clustering: A partition-and-group framework. In Proceedings of the ACM SIGMOD International Conference on Management of Data. Google Scholar
Digital Library
- Michael S. Lew, Nicu Sebe, Chabane Djeraba, and Ramesh Jain. 2006. Content-based multimedia information retrieval: State of the art and challenges. ACM Trans. Multimedia Comput. Commun. Appl. 2, 1, 1--19. Google Scholar
Digital Library
- Yingbo Li and B. Merialdo. 2010a. Multi-video summarization based on AV-MMR. In Proceedings of the International Workshop on Content-Based Multimedia Indexing.Google Scholar
- Yingbo Li and B. Merialdo. 2010b. Multi-video summarization based on Video-MMR. In Proceedings of the International Workshop on Image Analysis for Multimedia Interactive Services.Google Scholar
- Yingbo Li and B. Merialdo. 2011. Multi-video summarization based on OB-MMR. In Proceedings of the International Workshop on Content-Based Multimedia Indexing.Google Scholar
- Yiwen Luo and Xiaoou Tang. 2008. Photo and video quality evaluation: focusing on the subject. In Proceedings of the European Conference on Computer Vision. Google Scholar
Digital Library
- L. Marchesotti, F. Perronnin, D. Larlus, and G. Csurka. 2011. Assessing the aesthetic quality of photographs using generic image descriptors. In Proceedings of the IEEE International Conference on Computer Vision. Google Scholar
Digital Library
- Arthur G. Money and Harry Agius. 2008. Video Summarisation: A conceptual framework and survey of the state of the art. J. Visual Commun. Image Represent. 19, 121--143. Google Scholar
Digital Library
- Anush K. Moorthy, Pere Obrador, and Nuria Oliver. 2010. Towards computational models of the visual aesthetic appeal of consumer videos. In Proceedings of the European Conference on Computer Vision. Google Scholar
Digital Library
- I. Otsuka, K. Nakane, A. Divakaran, K. Hatanaka, and M. Ogawa. 2005. A highlight scene detection and video summarization system using audio feature for a personal video recorder. IEEE Trans. Consum. Electron. 51, 1, 112--116. Google Scholar
Digital Library
- Mukesh Kumar Saini, Raghudeep Gadde, Shuicheng Yan, and Wei Tsang Ooi. 2012. MoViMash: Online mobile video mashup. In Proceedings of the ACM International Conference on Multimedia. Google Scholar
Digital Library
- J. Shao, D. Jiang, M. Wang, H. Chen, and L. Yao. 2010. Multi-video Summarization Using Complex Graph Clustering and Mining. Comput. Sci. Inf. Syst. 7, 1, 85--98.Google Scholar
Cross Ref
- Xi Shao, Changsheng Xu, Namunu C. Maddage, Qi Tian, Mohan S. Kankanhalli, and Jesse S. Jin. 2006. Automatic summarization of music videos. ACM Trans. Multimedia Comput. Commun. Appl. 2, 2. Google Scholar
Digital Library
- F. Shipman, Andreas Girgensohn, and Lynn Wilcox. 2003. Creating navigable multi-level video summaries. In Proceedings of the IEEE International Conference on Multimedia and Expo. Google Scholar
Digital Library
- Hsiao-Hang Su, Tse-Wei Chen, Chieh-Chi Kao, Winston H. Hsu, and Shao-Yi Chien. 2011. Scenic photo quality assessment with bag of aesthetics-preserving features. In Proceedings of the ACM International Conference on Multimedia. Google Scholar
Digital Library
- Xiaoshuai Sun, Hongxun Yao, Rongrong Ji, and Shaohui Liu. 2009. Photo assessment based on computational visual attention model. In Proceedings of the ACM International Conference on Multimedia. Google Scholar
Digital Library
- Ba Tu Truong and Svetha Venkatesh. 2007. Video abstraction: A systematic review and classification. ACM Trans. Multimedia Comput. Commun. Appl. 3, 1. Google Scholar
Digital Library
- Feng Wang and Bernard Merialdo. 2009. Multi-document video summarization. In Proceedings of the IEEE International Conference on Multimedia and Expo. Google Scholar
Digital Library
- Xiaoyu Wang, T. X. Han, and Shuicheng Yan. 2009. An HOG-LBP human detector with partial occlusion handling. In Proceedings of the IEEE International Conference on Computer Vision.Google Scholar
Cross Ref
- Yanran Wang. 2013. Beauty is here: Evaluating aesthetics in videos using multimodal features and free training data. In Proceedings of the ACM International Conference on Multimedia. Google Scholar
Digital Library
- Zhou Wang and Qiang Li. 2007. Video quality assessment using a statistical model of human visual speed perception. J. Opt. Soc. Am. A 24, 12, B61--B69.Google Scholar
Cross Ref
- Zhou Wang, Hamid R. Sheikh, and Alan C. Bovik. 2003. Objective video quality assessment. In The Handbook of Video Databases: Design and Applications, 1041--1078.Google Scholar
- Stefan Wilk and Wolfgang Effelsberg. 2013. Crowd-sourced evaluation of the perceived viewing quality in user-generated video. In Proceedings of the ACM International Workshop on Crowdsourcing for Multimedia. Google Scholar
Digital Library
- Changsheng Xu, Jinjun Wang, Hanqing Lu, and Yifan Zhang. 2008. A novel framework for semantic annotation and personalized retrieval of sports video. IEEE Trans. Multimedia 10, 3, 421--436. Google Scholar
Digital Library
- Changsheng Xu, Jinjun Wang, Kongwah Wan, Yiqun Li, and Lingyu Duan. 2006. Live sports event detection based on broadcast video and web-casting text. In Proceedings of the ACM International Conference on Multimedia. Google Scholar
Digital Library
- Jianzhou Yan, S. Lin, Sing Bing Kang, and Xiaoou Tang. 2013. Learning the change for automatic image cropping. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Google Scholar
Digital Library
- Chun-Yu Yang, Hsin-Ho Yeh, and Chu-Song Chen. 2011. Video aesthetic quality assessment by combining semantically independent and dependent features. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing.Google Scholar
Cross Ref
- Yang Yang, Yi Yang, and Heng Tao Shen. 2013. Effective transfer tagging from image to video. ACM Trans. Multimedia Comput. Commun. Appl. 9, 2. Google Scholar
Digital Library
- Luming Zhang, Yue Gao, R. Zimmermann, Qi Tian, and Xuelong Li. 2014. Fusion of multichannel local and global structural cues for photo aesthetics evaluation. IEEE Trans. Image Process. 23, 3, 1419--1429. Google Scholar
Digital Library
- Luming Zhang, Yue Gao, Rongrong Ji, Qionghai Dai, and Xuelong Li. 2013c. Actively learning human gaze shifting paths for photo cropping. IEEE Trans. Image Process. 23, 5.Google Scholar
- Luming Zhang, Mingli Song, Qi Zhao, Xiao Liu, Jiajun Bu, and Chun Chen. 2013b. Probabilistic graphlet transfer for photo cropping. IEEE Trans. Image Process. 21, 5, 2887C2897. Google Scholar
Digital Library
- Ying Zhang, He Ma, and Roger Zimmermann. 2013a. Dynamic multi-video summarization of sensor-rich videos in geo-space. Adv. Multimedia Modeling 7732, 380--390.Google Scholar
Cross Ref
- Ying Zhang, Guanfeng Wang, Beomjoo Seo, and Roger Zimmermann. 2012. Multi-video summary and skim generation of sensor-rich videos in geo-space. In Proceedings of the ACM Multimedia Systems Conference. Google Scholar
Digital Library
- Ying Zhang and Roger Zimmermann. 2012. DVS: A dynamic multi-video summarization system of sensor-rich videos in geo-space. In Proceedings of the ACM International Conference on Multimedia. Google Scholar
Digital Library
Index Terms
Aesthetics-Guided Summarization from Multiple User Generated Videos
Recommendations
Unsupervised summarization of rushes videos
MM '10: Proceedings of the 18th ACM international conference on MultimediaThis paper proposes a new framework to formulate summarization of rushes video as an unsupervised learning problem. We pose the problem of video summarization as one of time-series clustering, and proposed Constrained Aligned Cluster Analysis (CACA). ...
Topic-driven reader comments summarization
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge managementReaders of a news article often read its comments contributed by other readers. By reading comments, readers obtain not only complementary information about this news article but also the opinions from other readers. However, the existing ranking ...






Comments