Abstract
Video key frame extraction is one of the most important research problems for video summarization, indexing, and retrieval. For a variety of applications such as ubiquitous media access and video streaming, the temporal boundaries between video key frames are required for synchronizing visual content with audio. In this article, we define temporal video sampling as a unified process of extracting video key frames and computing their temporal boundaries, and formulate it as an optimization problem. We first provide an optimal approach that minimizes temporal video sampling error using a dynamic programming process. The optimal approach retrieves a key frame hierarchy and all temporal boundaries in O(n4) time and O(n2) space. To further reduce computational complexity, we also provide a suboptimal greedy algorithm that exploits the data structure of a binary heap and uses a novel “look-ahead” computational technique, enabling all levels of key frames to be extracted with an average-case computational time of O(n log n) and memory usage of O(n). Both the optimal and the greedy methods are free of parameters, thus avoiding the threshold-selection problem that exists in other approaches. We empirically compare the proposed optimal and greedy methods with several existing methods in terms of video sampling error, computational cost, and subjective quality. An evaluation of eight videos of different genres shows that the greedy approach achieves performance very close to that of the optimal approach while drastically reducing computational cost, making it suitable for processing long video sequences in large video databases.
- Aner, A. and Kender, J. R. 2004. Video summaries and cross-referencing through mosaic-based representation. Computer Vision and Image Understanding 95, 2, 201--237. Google Scholar
Digital Library
- Ardizzone, E. and Hacid, M.-S. 1999. A semantic modeling approach for video retrieval by content. In Proceedings of the IEEE International Conference on Multimedia Computing and Systems (June). Florence, Italy, 158--162. Google Scholar
Digital Library
- Boreczky, J. and Rowe, L. 1996. Comparison of video shot boundary detection techniques. In Proceedings of the International Conference of Storage and Retrieval for Still Image and Video Databases (SPIE). 170--179.Google Scholar
- Chang, H. S., Sull, S., and Lee, S. U. 1999. Efficient video indexing scheme for content-based retrieval. IEEE Trans. Circ. Syst. for Video Tech. 9, 8, 1269--1279. Google Scholar
Digital Library
- Chang, S.-F. 2003. Content-based video summarization and adaptation for ubiquitous media access. In Proceedings of the IEEE International Conference on Image Analysis and Processing (ICIAP). Montau, Italy, 494--496. Google Scholar
Digital Library
- Chang, S.-F., Chen, W., Meng, H., and Sundaram, H. 1997. VideoQ: An automated content based video search system using visual cues. In Proceedings of ACM Multimedia. Seattle, WA, 313--324. Google Scholar
Digital Library
- Chiu, P., Girgensohn, A., Polak, W., Rieffel, E., and Wilcox, L. 2000. A genetic algorithm for video segmentation and summarization. In IEEE International Conference on Multimedia and Expo. vol. 3. 1329--1332.Google Scholar
- Chua, T.-S. and Ruan, L.-Q. 1995. A video retrieval and sequencing system. ACM Trans. Inform. Syst. 13, 373--407. Google Scholar
Digital Library
- DeMenthon, D., Kobla, V., and Doermann, D. 1998. Video summarization by curve simplification. In Proceedings of the 6th ACM Internationl Conference on Multimedia. Bristol, England, 211--218. Google Scholar
Digital Library
- Dimitrova, N., McGee, T., and Elenbaas, H. 1997. Video key frame extraction and filtering: A key frame is not a key frame to everyone. In Proceedings of the International Conference on Information and Knowledge Management. 113--120. Google Scholar
Digital Library
- Divakaran, A., Radhakrishnan, R., and Peker, K. A. 2002. Motion activity-based extraction of key-frames from video shots. In International Conference on Image Processing. vol. 1. 932--935.Google Scholar
- Fan, J., Luo, H., and Elmagarmid, A. 2004. Concept-oriented indexing of video database towards more effective retrieval and browsing. IEEE Trans. Image Proc. 13, 7, 974--992. Google Scholar
Digital Library
- Fauvet, B., Bouthemy, P., Gros, P., and Spindler, F. 2004. A geometrical key-frame selection method exploiting dominant motion estimation in video. In IEEE International Conference on Content-based Image and Video Retrieval. 419--427.Google Scholar
- Ferman, A. M. and Tekalp, A. M. 2003. Two-stage hierarchical video summary extraction to match low-level user browsing preferences. IEEE Trans. Multi. 5, 2 (June), 244--256. Google Scholar
Digital Library
- Girgensohn, A. and Boreczky, J. 1999. Time-constrained keyframe selection technique. In Proceedings of the IEEE International Conference on Multimedia Computing and Systems. 756--761. Google Scholar
Digital Library
- Hanjalic, A. and Zhang, H. 1999. An integrated scheme for automated video abstraction based on unsupervised cluster-validity analysis. IEEE Trans. Circuits Syst. for Video Tech. 9, 8, 1280--1289. Google Scholar
Digital Library
- Ho, Y.-H., Chen, W.-R., and Lin, C.-W. 2004. A rate-constrained key-frame extraction scheme for channel-aware video streaming. In Proceedings of the IEEE International Conference on Image Processing. vol. 1. 613--616.Google Scholar
- Idris, F. and Panchanathan, S. 1997. Review of image and video indexing techniques. J. Visual Comm. Image Rep. 8, 2 (June), 146--166.Google Scholar
Digital Library
- Kender, J. and Yeo, B. 2000. On the structure and analysis of home videos. In Proceedings of the Asian Conference on Computer Vision.Google Scholar
- Koh, J.-L., Lee, C.-S., and Chen, A. L. 1999. Semantic video model for content-based retrieval. In Proceedings of the IEEE International Conference on Multimedia Computing and Systems. 472--478. Google Scholar
Digital Library
- Lee, H.-C. and Kim, S.-D. 2002. Rate-driven key frame selection using temporal variation of visual content. Electronics Letters 38, 5, 217--218.Google Scholar
Cross Ref
- Lee, S.-H., Yeh, C. H., and Kuo, C. J. 2004. Video skimming based on story units via general tempo analysis. In IEEE International Conference on Multimedia and Expo. vol. 2. 1099--1102.Google Scholar
- Liu, T. and Kender, J. 2001. Time-constrained dynamic semantic compression for video indexing and interactive searching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. vol. 2. 531--538.Google Scholar
- Liu, T. and Kender, J. R. 2002. Rule-based semantic summarization of instructional videos. In Proceedings of the International Conference on Image Processing. vol. 1. 601--604.Google Scholar
- Liu, T. and Kender, J. R. 2003. Semantic mosaic for indexing and compressing instructional videos. In International Conference on Image Processing. vol. 1. 921--924.Google Scholar
- Liu, T., Zhang, H.-J., and Qi, F. 2003. A novel video key-frame-extraction algorithm based on perceived motion energy model. IEEE Trans. Circuits Syst. for Video Tech. 13, 10 (Oct.), 1006--1013. Google Scholar
Digital Library
- Ma, Y.-F. and Zhang, H.-J. 2002. A model of motion attention for video skimming. In International Conference on Image Processing.vol. 1. 129--132.Google Scholar
- Mandal, M. K., Idris, F., and Panchanathan, S. 1999. A Critical evaluation of image and video indexing techniques in compressed domain. Image and Vision Computing, 513--529.Google Scholar
- Peker, K. A. and Divakaran, A. 2004. Adaptive fast playback-based video skimming using a compressed-domain visual complexity measure. In IEEE International Conference on Multimedia and Expo. vol. 3. 2055--2058.Google Scholar
- Robles-Kelly, A. and Hancock, E. R. 2005. Graph edit distance from spectral seriation. IEEE Trans. Patt. Recog. Mach. Intell. 27, 3 (March), 365--378. Google Scholar
Digital Library
- Rong, J., Jin, W., and Wu, L. 2004. Key frame extraction using inter-shot information. In IEEE International Conference on Multimedia and Expo. 571--574.Google Scholar
- Rubner, Y., Tomasi, C., and Guibas, L. J. 1998. A metric for distributions with applications to image databases. In Proceedings of the IEEE International Conference on Computer Vision. 59--66. Google Scholar
Digital Library
- Smith, M. and Kanade, T. 1997. Video skimming and characterization through the combination of image and language understanding techniques. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. 755--781. Google Scholar
Digital Library
- Smith, M. and Kanade, T. 1998. Video skimming and characterization through the combination of image and language understanding. In Proceedings of the IEEE International Worksop on Content-based Access of Image and Video Databases. 61--70. Google Scholar
Digital Library
- Sun, X. and Kankanhalli, M. S. 2000. Video summarization using r-sequences. J. Real Time Imaging 6, 6, 449--459. Google Scholar
Digital Library
- Sundaram, H. and Chang, S.-F. 2001. Constrained utility mazimization for generating visual skims. In Proceedings of the IEEE International Worksop on Content-based Access of Image and Video Databases. 124--131. Google Scholar
Digital Library
- Teodosio, L. and Bender, W. 2005. Salient stills. ACM Trans. Multi. Comput. Comm. App. 1, 1, 16--36. Google Scholar
Digital Library
- Wolf, W. 1996. Key frame selection by motion analysis. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. 1228--1231. Google Scholar
Digital Library
- Yeung, M. and Liu, B. 1995. Efficient matching and clustering of video shots. In Proceedings of the International Conference on Image Processing. vol. I. 338--341. Google Scholar
Digital Library
- Yeung, M. and Yeo, B. 1996. Time-constrained clustering for segmentation of video into story units. In Proceedings of International Conference on Pattern Recognition. vol. C. 375--380. Google Scholar
Digital Library
- Zhang, D. and Chang, S.-F. 2004. Detecting image near-duplicate by stochastic attributed relational graph matching with learning. In Proceedings of ACM Multimedia. 877--885. Google Scholar
Digital Library
- Zhang, H. J., Low, C. Y., Smoliar, S. W., and Wu, J. H. 1995. Video parsing, retrieval and browsing: an intergrated and content-based solution. In Proceedings of ACM Multimedia. 15--24. Google Scholar
Digital Library
- Zhang, X.-D., Liu, T.-Y., Lo, K.-T., and Feng, J. 2003. Dynamic selection and effective compression of key-frames for video abstraction. Patt. Recog. Letters 24, 9-10, 1523--1532. Google Scholar
Digital Library
- Zhou, X. S. and Liou, S.-P. 2002. Optimal nonlinear sampling for video streaming at low bit rates. IEEE Trans. Circuits Sys. for Video Tech. 12, 6 (June), 535--544. Google Scholar
Digital Library
- Zhuang, Y., Rui, Y., Huang, T. S., and Mehrotra, S. 1998. Adaptive key frame extraction using unsupervised clustering. In Proceedings of IEEE International Conference on Image Processing. 866--870.Google Scholar
Index Terms
Computational approaches to temporal sampling of video sequences
Recommendations
Action Recognition with Non-Uniform Key Frame Selector
IPMV '23: Proceedings of the 2023 5th International Conference on Image Processing and Machine VisionCurrent approaches for spatiotemporal action recognition have achieved impressive progress, especially in temporal information processing. Meanwhile, the power of spatial information may be underestimated. Thus, a non-uniform key frame selector is ...
A smart video player with content-based fast-forward playback
MM '11: Proceedings of the 19th ACM international conference on MultimediaIn this paper, we develop a video player to allow the users to do fast-forward playback based on the semantic video content. The whole system has two modules, processing and playing. In the processing part, we present a video time density function (VTDF)...
A two-level queueing system for interactive browsing and searching of video content
This paper presents a two-level queueing system for dynamic summarization and interactive searching of video content. Video frames enter the queueing system; some insignificant and redundant frames are removed; the remaining frames are pulled out of the ...






Comments