skip to main content
research-article

Incorporating feature hierarchy and boosting to achieve more effective classifier training and concept-oriented video summarization and skimming

Published:11 February 2008Publication History
Skip Abstract Section

Abstract

For online medical education purposes, we have developed a novel scheme to incorporate the results of semantic video classification to select the most representative video shots for generating concept-oriented summarization and skimming of surgery education videos. First, salient objects are used as the video patterns for feature extraction to achieve a good representation of the intermediate video semantics. The salient objects are defined as the salient video compounds that can be used to characterize the most significant perceptual properties of the corresponding real world physical objects in a video, and thus the appearances of such salient objects can be used to predict the appearances of the relevant semantic video concepts in a specific video domain. Second, a novel multi-modal boosting algorithm is developed to achieve more reliable video classifier training by incorporating feature hierarchy and boosting to dramatically reduce both the training cost and the size of training samples, thus it can significantly speed up SVM (support vector machine) classifier training. In addition, the unlabeled samples are integrated to reduce the human efforts on labeling large amount of training samples. Finally, the results of semantic video classification are incorporated to enable concept-oriented video summarization and skimming. Experimental results in a specific domain of surgery education videos are provided.

Skip Supplemental Material Section

Supplemental Material

a1-luo.mov

References

  1. Adames, B., Dorai, C., and Venkatesh, S. 2002. Towards automatic extraction of expressive elements of motion pictures: Tempo. IEEE Trans. Multimedia 4, 4, 472--481. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Adams, W., Iyengar, G., Lin, C.-Y., Naphade, M., Neti, C., Nock, H., and Smith, J. 2003. Semantic indexing of multimedia content using visual, audio and text cues. EURASIP J. Appl. Sig. Proc. 2, 1--16.Google ScholarGoogle Scholar
  3. Alatan, A., Onural, L., Wollborn, M., Mech, R., Tuncel, E., and Sikora, T. 1998. Image sequence analysis for emerging interactive multimedia services-the european cost 211 framework. IEEE Trans. Circ. Syst. Video Tech. 8, 7, 802--813. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Arman, F., Depommier, R., Hsu, A., and Chiu, M. 1994. Content-based browsing of video sequences. In ACM Multimedia. ACM, New York, 97--103. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Chang, E., Goh, K., Sychay, G., and Wu, G. 2002. Cbsa: Content-based annotation for multimodal image retrieval using bayes point machines. IEEE Trans. Circ. Syst. Video Tech. 13, 1, 26--38. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Chang, S.-F. 2002. Optimal video adaptation and skimming using a utility-based framework. In Proceedings of the International Tyrrhenian Workshop on Digital Communications.Google ScholarGoogle Scholar
  7. Chang, S.-F., Chen, W., and Sundaram, H. 1998. Semantic visual templates: linking visual features to semantics. In Proceedings of the International Conference on Image Processing. Vol. 3. IEEE Computer Society Press, Los Alamitos, CA, 531--535.Google ScholarGoogle Scholar
  8. Cohen, I., Sebe, N., Cozman, F., Cirelo, M., and Huang, T. 2004. Semi-supervised learning of classifiers: Theory and algorithms and their applications to human-computer interaction. IEEE Trans. Patt. Anal. Mach. Intel. 26, 12, 1553--1567. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Correia, P. and Pereira, F. 2004. Classification of video segmentation application scenarios. IEEE Trans. Circ. Syst. Video Tech. 14, 5, 735--741. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Cristianini, N. and Shawe-Taylor, J. 2000. An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, Cambridge, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Deshpande, S. and Hwang, J.-N. 2001. A real-time interactive virtual classroom multimedia distance learning system. IEEE Trans. Multimed. 3, 4, 432--444. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Dimitrova, N., Agnihotri, L., and Wei, G. 2000. Video classification based on hmm using text and faces. In ACM Multimedia. ACM, New York, 499--500.Google ScholarGoogle Scholar
  13. Djeraba, C. 2000. When image indexing meets knowledge discovery. In MDM/KDD. ACM, New York, 73--81.Google ScholarGoogle Scholar
  14. Djeraba, C. 2002. Multimedia Mining: A Highway to Intelligent Multimedia Documents. Kluwer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Ebadollahi, S., Chang, S.-F., and Wu, H. 2002. Echocardiogram videos: Summarization, temporal segmentation and browsing. In Proceedings of the International Conference on Image Processing. IEEE Computer Society Press, Los Alamitos, CA, I--613--I--616.Google ScholarGoogle Scholar
  16. Ekin, A., Tekalp, A., and Mehrotra, R. 2003. Automatic soccer video analysis and summarization. IEEE Trans. Image Process. 12, 796--807. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Fan, J., Luo, H., and Elmagarmid, A. 2004. Concept-oriented indexing of video database toward more effective retrieval and browsing. IEEE Trans. Image Process. 13, 7, 974--992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Fan, J., Yau, D., Elmagarmid, A., and Aref, W. 2001. Image segmentation by integrating color edge detection and seeded region growing. IEEE Trans. Image Process. 10, 1454--1466. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Fan, R.-E., Chen, P.-H., and Lin, C.-J. 2005. Working set selection using the second order information for training svm. J. Mach. Learn. Res. 6, 1889--1918. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Fischer, S., Lienhart, R., and Effelsberg, W. 1995. Automatic recognition of film genres. In ACM Multimedia. ACM, New York, 367--368. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Freund, Y. and Schapire, R. 1996. Experiments with a new boosting algorithm. In Proceedings of the International Conference on Machine Learning. Morgan Kaufmann, San Francisco, CA, 148--156.Google ScholarGoogle Scholar
  22. Gatica-Perez, D., Loui, A., and Sun, M.-T. 2003. Finding structure in home videos by probabilistic hierarchical clustering. IEEE Trans. Circ. Syst. Video Tech. 13, 6, 539--548. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Greenspan, H., Goldberger, J., and Mayer, A. 2004. Probabilistic space-time video modeling via piecewise gmm. IEEE Trans. Patt. Anal. Mach. Intel. 26, 3, 384--396. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Haering, N., Qian, R., and Sezan, M. 2000. A semantic event-based detection approach and its application to detecting hunts in wildlife video. IEEE Trans. Circ. Syst. Video Tech. 10, 6, 857--868. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Hanjalic, A., Lagendijk, R., and Biomond, J. 1999. Automated high-level movie segmentation for advanced video retrieval system. IEEE Trans. Circ. Syst. Video Tech. 9, 4, 580--588. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. He, L., Sanocki, E., Gupta, A., and Grudin, J. 1999. Auto-summarization of audio-video presentations. In ACM Multimedia. ACM, New York, 489--498. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Jaimes, A. and Chang, S. 2001. Learning structured visual detectors from user input at multiple levels. Int. J. Image Graph. 1, 3, 415--444.Google ScholarGoogle ScholarCross RefCross Ref
  28. Joachims, T. 1999. Transductive inference for text classification using support vector machines. In Proceedings of the International Conference on Machine Learning. Morgan, Kaufmann, San Francisco, CA, 200--209. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Kender, J. and Yeo, B.-L. 1998. Video scene segmentation via continuous video coherence. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE Computer Society Press, Los Alamitos, CA, 367--373. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Lew, M. 2001. Principles of Visual Information Retrieval. Springer-Verlag, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Li, Y., Park, Y., and Dorai, C. 2006. Atomic topical segments detection for instructional videos. In ACM Multimedia. ACM, New York, 53--56. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Liu, T. and Kender, J. 2004. Lecture videos for e-learning: Current research and challenges. In IEEE International Symposium on Multimedia Software Engineering. IEEE Computer Society Press, Los Alamitos, CA, 574--578. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Liu, Z., Wang, Y., and Chen, T. 1998. Audio feature extraction and analysis for scene segmentation and classification. J. VLSI Signal Process. Syst. 20, 1, 61--79. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Luo, H., Fan, J., Gao, Y., and Xu, G. 2004. Multimodal salient objects: General building blocks of semantic video concepts. In Proceedings of the International Conference on Image and Video Retrieval. Springer, Berlin /Heidelberg, Germany, 374--383.Google ScholarGoogle Scholar
  35. Ma, Y., Lu, L., Zhang, H., and Li, M. 2002. A user attention model for video summarization. In ACM Multimedia. ACM, New York, 533--542. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Naphade, M. and Huang, T. 2001. A probabilistic framework for semantic video indexing, filtering, and retrival. IEEE Trans. Multimed. 3, 141--151. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. O'Sullivan, J., Langford, J., and Blum, A. 2000. Featureboost: A meta learning algorithm that improves model robustness. In Proceedings of the International Conference on Machine Learning. Morgan, Kaufmann, San Francisco, CA, 703--710. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Pfeiffer, S., Lienhart, R., and Effelsberg, W. 1999. Scene determination based on video and audio features. In Proceedings of the IEEE International Conference on Multimedia Computing and Systems, Vol. 15. IEEE Computer Society Press, Los Alamitos, CA, 685--690. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Platt, J. 1999. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adavances in Large Margin Classifiers, MIT Press, Cambridge, MA.Google ScholarGoogle Scholar
  40. Qi, Y., Liu, T., and Hauptmann, A. 2003. Supervised classification of video shot segmentation. In International Conference on Multimedia and Expo. IEEE Computer Society Press, Los Alamitos, CA, II--689--92. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Sebe, N., Lew, M., and Smeulders, A. 2003. Video retrieval and summarization. Comput. Vision Image Understand. 92, 2, 146--152. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Smith, M. and Kanade, T. 1995. Video skimming for quick browsing based on audio and image characterization. Tech. rep., CMU: TR-CMU-CS-95-186.Google ScholarGoogle Scholar
  43. Snoek, C. and Morring, M. 2003. Multimodal video indexing: A state of the art review. Multimed. Tools Appl. 25, 1, 5--35. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Sundaram, H. and Chang, S. 2002a. Computable scenes and structures in films. IEEE Trans. Multimed. 4, 482--491. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Sundaram, H. and Chang, S.-F. 2002b. Video skims: Taxonomies and an optimal generation framework. In Proceedings of the International Conference on Image Processing. IEEE Computer Society Press, Los Alamitos, CA, II--21--II--24.Google ScholarGoogle Scholar
  46. Sundaram, H., Xie, L., and Chang, S.-F. 2002. A unility framework for the automatic generation of audio-visual skims. In ACM Multimedia. ACM, New York, 189--198. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Tieu, K. and Viola, P. 2000. Boosting image retrieval. Int. J. Comput. Vision 56, 1, 17--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Vapnik, V. 1998. Statistical Learning Theory. Wiley-Interscience, New York.Google ScholarGoogle Scholar
  49. Xie, L., Xu, P., Chang, S., Divakaran, A., and Sun, H. 2003. Structure analysis of soccer video with domain knowledge and hidden Markov models. Pattern Recognition Letters 24, 767--775. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Zhang, D. and Nunamaker, J. 2004. A natural language approach to content-based video indexing and retrieval for interactive e-learning. IEEE Trans. Multimed. 6, 3, 450--458. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Zhang, H., Kankanhalli, A., and Smoliar, S. 1993. Automatic parsing of video. In International Conference on Multimedia Systems. Vol. 1. IEEE Computer Society Press, Los Alamitos, CA, 45--54.Google ScholarGoogle Scholar
  52. Zhou, W., Vellaikal, A., and Kuo, C. 2000. Rule-based video classification system for basketball video indexing. In ACM Multimedia. ACM, New York, 213--216. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Incorporating feature hierarchy and boosting to achieve more effective classifier training and concept-oriented video summarization and skimming

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!