skip to main content
article

Computational approaches to temporal sampling of video sequences

Published:01 May 2007Publication History
Skip Abstract Section

Abstract

Video key frame extraction is one of the most important research problems for video summarization, indexing, and retrieval. For a variety of applications such as ubiquitous media access and video streaming, the temporal boundaries between video key frames are required for synchronizing visual content with audio. In this article, we define temporal video sampling as a unified process of extracting video key frames and computing their temporal boundaries, and formulate it as an optimization problem. We first provide an optimal approach that minimizes temporal video sampling error using a dynamic programming process. The optimal approach retrieves a key frame hierarchy and all temporal boundaries in O(n4) time and O(n2) space. To further reduce computational complexity, we also provide a suboptimal greedy algorithm that exploits the data structure of a binary heap and uses a novel “look-ahead” computational technique, enabling all levels of key frames to be extracted with an average-case computational time of O(n log n) and memory usage of O(n). Both the optimal and the greedy methods are free of parameters, thus avoiding the threshold-selection problem that exists in other approaches. We empirically compare the proposed optimal and greedy methods with several existing methods in terms of video sampling error, computational cost, and subjective quality. An evaluation of eight videos of different genres shows that the greedy approach achieves performance very close to that of the optimal approach while drastically reducing computational cost, making it suitable for processing long video sequences in large video databases.

References

  1. Aner, A. and Kender, J. R. 2004. Video summaries and cross-referencing through mosaic-based representation. Computer Vision and Image Understanding 95, 2, 201--237. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Ardizzone, E. and Hacid, M.-S. 1999. A semantic modeling approach for video retrieval by content. In Proceedings of the IEEE International Conference on Multimedia Computing and Systems (June). Florence, Italy, 158--162. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Boreczky, J. and Rowe, L. 1996. Comparison of video shot boundary detection techniques. In Proceedings of the International Conference of Storage and Retrieval for Still Image and Video Databases (SPIE). 170--179.Google ScholarGoogle Scholar
  4. Chang, H. S., Sull, S., and Lee, S. U. 1999. Efficient video indexing scheme for content-based retrieval. IEEE Trans. Circ. Syst. for Video Tech. 9, 8, 1269--1279. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Chang, S.-F. 2003. Content-based video summarization and adaptation for ubiquitous media access. In Proceedings of the IEEE International Conference on Image Analysis and Processing (ICIAP). Montau, Italy, 494--496. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Chang, S.-F., Chen, W., Meng, H., and Sundaram, H. 1997. VideoQ: An automated content based video search system using visual cues. In Proceedings of ACM Multimedia. Seattle, WA, 313--324. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Chiu, P., Girgensohn, A., Polak, W., Rieffel, E., and Wilcox, L. 2000. A genetic algorithm for video segmentation and summarization. In IEEE International Conference on Multimedia and Expo. vol. 3. 1329--1332.Google ScholarGoogle Scholar
  8. Chua, T.-S. and Ruan, L.-Q. 1995. A video retrieval and sequencing system. ACM Trans. Inform. Syst. 13, 373--407. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. DeMenthon, D., Kobla, V., and Doermann, D. 1998. Video summarization by curve simplification. In Proceedings of the 6th ACM Internationl Conference on Multimedia. Bristol, England, 211--218. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Dimitrova, N., McGee, T., and Elenbaas, H. 1997. Video key frame extraction and filtering: A key frame is not a key frame to everyone. In Proceedings of the International Conference on Information and Knowledge Management. 113--120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Divakaran, A., Radhakrishnan, R., and Peker, K. A. 2002. Motion activity-based extraction of key-frames from video shots. In International Conference on Image Processing. vol. 1. 932--935.Google ScholarGoogle Scholar
  12. Fan, J., Luo, H., and Elmagarmid, A. 2004. Concept-oriented indexing of video database towards more effective retrieval and browsing. IEEE Trans. Image Proc. 13, 7, 974--992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Fauvet, B., Bouthemy, P., Gros, P., and Spindler, F. 2004. A geometrical key-frame selection method exploiting dominant motion estimation in video. In IEEE International Conference on Content-based Image and Video Retrieval. 419--427.Google ScholarGoogle Scholar
  14. Ferman, A. M. and Tekalp, A. M. 2003. Two-stage hierarchical video summary extraction to match low-level user browsing preferences. IEEE Trans. Multi. 5, 2 (June), 244--256. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Girgensohn, A. and Boreczky, J. 1999. Time-constrained keyframe selection technique. In Proceedings of the IEEE International Conference on Multimedia Computing and Systems. 756--761. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Hanjalic, A. and Zhang, H. 1999. An integrated scheme for automated video abstraction based on unsupervised cluster-validity analysis. IEEE Trans. Circuits Syst. for Video Tech. 9, 8, 1280--1289. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Ho, Y.-H., Chen, W.-R., and Lin, C.-W. 2004. A rate-constrained key-frame extraction scheme for channel-aware video streaming. In Proceedings of the IEEE International Conference on Image Processing. vol. 1. 613--616.Google ScholarGoogle Scholar
  18. Idris, F. and Panchanathan, S. 1997. Review of image and video indexing techniques. J. Visual Comm. Image Rep. 8, 2 (June), 146--166.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Kender, J. and Yeo, B. 2000. On the structure and analysis of home videos. In Proceedings of the Asian Conference on Computer Vision.Google ScholarGoogle Scholar
  20. Koh, J.-L., Lee, C.-S., and Chen, A. L. 1999. Semantic video model for content-based retrieval. In Proceedings of the IEEE International Conference on Multimedia Computing and Systems. 472--478. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Lee, H.-C. and Kim, S.-D. 2002. Rate-driven key frame selection using temporal variation of visual content. Electronics Letters 38, 5, 217--218.Google ScholarGoogle ScholarCross RefCross Ref
  22. Lee, S.-H., Yeh, C. H., and Kuo, C. J. 2004. Video skimming based on story units via general tempo analysis. In IEEE International Conference on Multimedia and Expo. vol. 2. 1099--1102.Google ScholarGoogle Scholar
  23. Liu, T. and Kender, J. 2001. Time-constrained dynamic semantic compression for video indexing and interactive searching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. vol. 2. 531--538.Google ScholarGoogle Scholar
  24. Liu, T. and Kender, J. R. 2002. Rule-based semantic summarization of instructional videos. In Proceedings of the International Conference on Image Processing. vol. 1. 601--604.Google ScholarGoogle Scholar
  25. Liu, T. and Kender, J. R. 2003. Semantic mosaic for indexing and compressing instructional videos. In International Conference on Image Processing. vol. 1. 921--924.Google ScholarGoogle Scholar
  26. Liu, T., Zhang, H.-J., and Qi, F. 2003. A novel video key-frame-extraction algorithm based on perceived motion energy model. IEEE Trans. Circuits Syst. for Video Tech. 13, 10 (Oct.), 1006--1013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Ma, Y.-F. and Zhang, H.-J. 2002. A model of motion attention for video skimming. In International Conference on Image Processing.vol. 1. 129--132.Google ScholarGoogle Scholar
  28. Mandal, M. K., Idris, F., and Panchanathan, S. 1999. A Critical evaluation of image and video indexing techniques in compressed domain. Image and Vision Computing, 513--529.Google ScholarGoogle Scholar
  29. Peker, K. A. and Divakaran, A. 2004. Adaptive fast playback-based video skimming using a compressed-domain visual complexity measure. In IEEE International Conference on Multimedia and Expo. vol. 3. 2055--2058.Google ScholarGoogle Scholar
  30. Robles-Kelly, A. and Hancock, E. R. 2005. Graph edit distance from spectral seriation. IEEE Trans. Patt. Recog. Mach. Intell. 27, 3 (March), 365--378. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Rong, J., Jin, W., and Wu, L. 2004. Key frame extraction using inter-shot information. In IEEE International Conference on Multimedia and Expo. 571--574.Google ScholarGoogle Scholar
  32. Rubner, Y., Tomasi, C., and Guibas, L. J. 1998. A metric for distributions with applications to image databases. In Proceedings of the IEEE International Conference on Computer Vision. 59--66. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Smith, M. and Kanade, T. 1997. Video skimming and characterization through the combination of image and language understanding techniques. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. 755--781. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Smith, M. and Kanade, T. 1998. Video skimming and characterization through the combination of image and language understanding. In Proceedings of the IEEE International Worksop on Content-based Access of Image and Video Databases. 61--70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Sun, X. and Kankanhalli, M. S. 2000. Video summarization using r-sequences. J. Real Time Imaging 6, 6, 449--459. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Sundaram, H. and Chang, S.-F. 2001. Constrained utility mazimization for generating visual skims. In Proceedings of the IEEE International Worksop on Content-based Access of Image and Video Databases. 124--131. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Teodosio, L. and Bender, W. 2005. Salient stills. ACM Trans. Multi. Comput. Comm. App. 1, 1, 16--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Wolf, W. 1996. Key frame selection by motion analysis. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. 1228--1231. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Yeung, M. and Liu, B. 1995. Efficient matching and clustering of video shots. In Proceedings of the International Conference on Image Processing. vol. I. 338--341. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Yeung, M. and Yeo, B. 1996. Time-constrained clustering for segmentation of video into story units. In Proceedings of International Conference on Pattern Recognition. vol. C. 375--380. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Zhang, D. and Chang, S.-F. 2004. Detecting image near-duplicate by stochastic attributed relational graph matching with learning. In Proceedings of ACM Multimedia. 877--885. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Zhang, H. J., Low, C. Y., Smoliar, S. W., and Wu, J. H. 1995. Video parsing, retrieval and browsing: an intergrated and content-based solution. In Proceedings of ACM Multimedia. 15--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Zhang, X.-D., Liu, T.-Y., Lo, K.-T., and Feng, J. 2003. Dynamic selection and effective compression of key-frames for video abstraction. Patt. Recog. Letters 24, 9-10, 1523--1532. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Zhou, X. S. and Liou, S.-P. 2002. Optimal nonlinear sampling for video streaming at low bit rates. IEEE Trans. Circuits Sys. for Video Tech. 12, 6 (June), 535--544. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Zhuang, Y., Rui, Y., Huang, T. S., and Mehrotra, S. 1998. Adaptive key frame extraction using unsupervised clustering. In Proceedings of IEEE International Conference on Image Processing. 866--870.Google ScholarGoogle Scholar

Index Terms

  1. Computational approaches to temporal sampling of video sequences

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!