Abstract
For online medical education purposes, we have developed a novel scheme to incorporate the results of semantic video classification to select the most representative video shots for generating concept-oriented summarization and skimming of surgery education videos. First, salient objects are used as the video patterns for feature extraction to achieve a good representation of the intermediate video semantics. The salient objects are defined as the salient video compounds that can be used to characterize the most significant perceptual properties of the corresponding real world physical objects in a video, and thus the appearances of such salient objects can be used to predict the appearances of the relevant semantic video concepts in a specific video domain. Second, a novel multi-modal boosting algorithm is developed to achieve more reliable video classifier training by incorporating feature hierarchy and boosting to dramatically reduce both the training cost and the size of training samples, thus it can significantly speed up SVM (support vector machine) classifier training. In addition, the unlabeled samples are integrated to reduce the human efforts on labeling large amount of training samples. Finally, the results of semantic video classification are incorporated to enable concept-oriented video summarization and skimming. Experimental results in a specific domain of surgery education videos are provided.
Supplemental Material
- Adames, B., Dorai, C., and Venkatesh, S. 2002. Towards automatic extraction of expressive elements of motion pictures: Tempo. IEEE Trans. Multimedia 4, 4, 472--481. Google Scholar
Digital Library
- Adams, W., Iyengar, G., Lin, C.-Y., Naphade, M., Neti, C., Nock, H., and Smith, J. 2003. Semantic indexing of multimedia content using visual, audio and text cues. EURASIP J. Appl. Sig. Proc. 2, 1--16.Google Scholar
- Alatan, A., Onural, L., Wollborn, M., Mech, R., Tuncel, E., and Sikora, T. 1998. Image sequence analysis for emerging interactive multimedia services-the european cost 211 framework. IEEE Trans. Circ. Syst. Video Tech. 8, 7, 802--813. Google Scholar
Digital Library
- Arman, F., Depommier, R., Hsu, A., and Chiu, M. 1994. Content-based browsing of video sequences. In ACM Multimedia. ACM, New York, 97--103. Google Scholar
Digital Library
- Chang, E., Goh, K., Sychay, G., and Wu, G. 2002. Cbsa: Content-based annotation for multimodal image retrieval using bayes point machines. IEEE Trans. Circ. Syst. Video Tech. 13, 1, 26--38. Google Scholar
Digital Library
- Chang, S.-F. 2002. Optimal video adaptation and skimming using a utility-based framework. In Proceedings of the International Tyrrhenian Workshop on Digital Communications.Google Scholar
- Chang, S.-F., Chen, W., and Sundaram, H. 1998. Semantic visual templates: linking visual features to semantics. In Proceedings of the International Conference on Image Processing. Vol. 3. IEEE Computer Society Press, Los Alamitos, CA, 531--535.Google Scholar
- Cohen, I., Sebe, N., Cozman, F., Cirelo, M., and Huang, T. 2004. Semi-supervised learning of classifiers: Theory and algorithms and their applications to human-computer interaction. IEEE Trans. Patt. Anal. Mach. Intel. 26, 12, 1553--1567. Google Scholar
Digital Library
- Correia, P. and Pereira, F. 2004. Classification of video segmentation application scenarios. IEEE Trans. Circ. Syst. Video Tech. 14, 5, 735--741. Google Scholar
Digital Library
- Cristianini, N. and Shawe-Taylor, J. 2000. An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, Cambridge, MA. Google Scholar
Digital Library
- Deshpande, S. and Hwang, J.-N. 2001. A real-time interactive virtual classroom multimedia distance learning system. IEEE Trans. Multimed. 3, 4, 432--444. Google Scholar
Digital Library
- Dimitrova, N., Agnihotri, L., and Wei, G. 2000. Video classification based on hmm using text and faces. In ACM Multimedia. ACM, New York, 499--500.Google Scholar
- Djeraba, C. 2000. When image indexing meets knowledge discovery. In MDM/KDD. ACM, New York, 73--81.Google Scholar
- Djeraba, C. 2002. Multimedia Mining: A Highway to Intelligent Multimedia Documents. Kluwer. Google Scholar
Digital Library
- Ebadollahi, S., Chang, S.-F., and Wu, H. 2002. Echocardiogram videos: Summarization, temporal segmentation and browsing. In Proceedings of the International Conference on Image Processing. IEEE Computer Society Press, Los Alamitos, CA, I--613--I--616.Google Scholar
- Ekin, A., Tekalp, A., and Mehrotra, R. 2003. Automatic soccer video analysis and summarization. IEEE Trans. Image Process. 12, 796--807. Google Scholar
Digital Library
- Fan, J., Luo, H., and Elmagarmid, A. 2004. Concept-oriented indexing of video database toward more effective retrieval and browsing. IEEE Trans. Image Process. 13, 7, 974--992. Google Scholar
Digital Library
- Fan, J., Yau, D., Elmagarmid, A., and Aref, W. 2001. Image segmentation by integrating color edge detection and seeded region growing. IEEE Trans. Image Process. 10, 1454--1466. Google Scholar
Digital Library
- Fan, R.-E., Chen, P.-H., and Lin, C.-J. 2005. Working set selection using the second order information for training svm. J. Mach. Learn. Res. 6, 1889--1918. Google Scholar
Digital Library
- Fischer, S., Lienhart, R., and Effelsberg, W. 1995. Automatic recognition of film genres. In ACM Multimedia. ACM, New York, 367--368. Google Scholar
Digital Library
- Freund, Y. and Schapire, R. 1996. Experiments with a new boosting algorithm. In Proceedings of the International Conference on Machine Learning. Morgan Kaufmann, San Francisco, CA, 148--156.Google Scholar
- Gatica-Perez, D., Loui, A., and Sun, M.-T. 2003. Finding structure in home videos by probabilistic hierarchical clustering. IEEE Trans. Circ. Syst. Video Tech. 13, 6, 539--548. Google Scholar
Digital Library
- Greenspan, H., Goldberger, J., and Mayer, A. 2004. Probabilistic space-time video modeling via piecewise gmm. IEEE Trans. Patt. Anal. Mach. Intel. 26, 3, 384--396. Google Scholar
Digital Library
- Haering, N., Qian, R., and Sezan, M. 2000. A semantic event-based detection approach and its application to detecting hunts in wildlife video. IEEE Trans. Circ. Syst. Video Tech. 10, 6, 857--868. Google Scholar
Digital Library
- Hanjalic, A., Lagendijk, R., and Biomond, J. 1999. Automated high-level movie segmentation for advanced video retrieval system. IEEE Trans. Circ. Syst. Video Tech. 9, 4, 580--588. Google Scholar
Digital Library
- He, L., Sanocki, E., Gupta, A., and Grudin, J. 1999. Auto-summarization of audio-video presentations. In ACM Multimedia. ACM, New York, 489--498. Google Scholar
Digital Library
- Jaimes, A. and Chang, S. 2001. Learning structured visual detectors from user input at multiple levels. Int. J. Image Graph. 1, 3, 415--444.Google Scholar
Cross Ref
- Joachims, T. 1999. Transductive inference for text classification using support vector machines. In Proceedings of the International Conference on Machine Learning. Morgan, Kaufmann, San Francisco, CA, 200--209. Google Scholar
Digital Library
- Kender, J. and Yeo, B.-L. 1998. Video scene segmentation via continuous video coherence. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE Computer Society Press, Los Alamitos, CA, 367--373. Google Scholar
Digital Library
- Lew, M. 2001. Principles of Visual Information Retrieval. Springer-Verlag, New York. Google Scholar
Digital Library
- Li, Y., Park, Y., and Dorai, C. 2006. Atomic topical segments detection for instructional videos. In ACM Multimedia. ACM, New York, 53--56. Google Scholar
Digital Library
- Liu, T. and Kender, J. 2004. Lecture videos for e-learning: Current research and challenges. In IEEE International Symposium on Multimedia Software Engineering. IEEE Computer Society Press, Los Alamitos, CA, 574--578. Google Scholar
Digital Library
- Liu, Z., Wang, Y., and Chen, T. 1998. Audio feature extraction and analysis for scene segmentation and classification. J. VLSI Signal Process. Syst. 20, 1, 61--79. Google Scholar
Digital Library
- Luo, H., Fan, J., Gao, Y., and Xu, G. 2004. Multimodal salient objects: General building blocks of semantic video concepts. In Proceedings of the International Conference on Image and Video Retrieval. Springer, Berlin /Heidelberg, Germany, 374--383.Google Scholar
- Ma, Y., Lu, L., Zhang, H., and Li, M. 2002. A user attention model for video summarization. In ACM Multimedia. ACM, New York, 533--542. Google Scholar
Digital Library
- Naphade, M. and Huang, T. 2001. A probabilistic framework for semantic video indexing, filtering, and retrival. IEEE Trans. Multimed. 3, 141--151. Google Scholar
Digital Library
- O'Sullivan, J., Langford, J., and Blum, A. 2000. Featureboost: A meta learning algorithm that improves model robustness. In Proceedings of the International Conference on Machine Learning. Morgan, Kaufmann, San Francisco, CA, 703--710. Google Scholar
Digital Library
- Pfeiffer, S., Lienhart, R., and Effelsberg, W. 1999. Scene determination based on video and audio features. In Proceedings of the IEEE International Conference on Multimedia Computing and Systems, Vol. 15. IEEE Computer Society Press, Los Alamitos, CA, 685--690. Google Scholar
Digital Library
- Platt, J. 1999. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adavances in Large Margin Classifiers, MIT Press, Cambridge, MA.Google Scholar
- Qi, Y., Liu, T., and Hauptmann, A. 2003. Supervised classification of video shot segmentation. In International Conference on Multimedia and Expo. IEEE Computer Society Press, Los Alamitos, CA, II--689--92. Google Scholar
Digital Library
- Sebe, N., Lew, M., and Smeulders, A. 2003. Video retrieval and summarization. Comput. Vision Image Understand. 92, 2, 146--152. Google Scholar
Digital Library
- Smith, M. and Kanade, T. 1995. Video skimming for quick browsing based on audio and image characterization. Tech. rep., CMU: TR-CMU-CS-95-186.Google Scholar
- Snoek, C. and Morring, M. 2003. Multimodal video indexing: A state of the art review. Multimed. Tools Appl. 25, 1, 5--35. Google Scholar
Digital Library
- Sundaram, H. and Chang, S. 2002a. Computable scenes and structures in films. IEEE Trans. Multimed. 4, 482--491. Google Scholar
Digital Library
- Sundaram, H. and Chang, S.-F. 2002b. Video skims: Taxonomies and an optimal generation framework. In Proceedings of the International Conference on Image Processing. IEEE Computer Society Press, Los Alamitos, CA, II--21--II--24.Google Scholar
- Sundaram, H., Xie, L., and Chang, S.-F. 2002. A unility framework for the automatic generation of audio-visual skims. In ACM Multimedia. ACM, New York, 189--198. Google Scholar
Digital Library
- Tieu, K. and Viola, P. 2000. Boosting image retrieval. Int. J. Comput. Vision 56, 1, 17--36. Google Scholar
Digital Library
- Vapnik, V. 1998. Statistical Learning Theory. Wiley-Interscience, New York.Google Scholar
- Xie, L., Xu, P., Chang, S., Divakaran, A., and Sun, H. 2003. Structure analysis of soccer video with domain knowledge and hidden Markov models. Pattern Recognition Letters 24, 767--775. Google Scholar
Digital Library
- Zhang, D. and Nunamaker, J. 2004. A natural language approach to content-based video indexing and retrieval for interactive e-learning. IEEE Trans. Multimed. 6, 3, 450--458. Google Scholar
Digital Library
- Zhang, H., Kankanhalli, A., and Smoliar, S. 1993. Automatic parsing of video. In International Conference on Multimedia Systems. Vol. 1. IEEE Computer Society Press, Los Alamitos, CA, 45--54.Google Scholar
- Zhou, W., Vellaikal, A., and Kuo, C. 2000. Rule-based video classification system for basketball video indexing. In ACM Multimedia. ACM, New York, 213--216. Google Scholar
Digital Library
Index Terms
Incorporating feature hierarchy and boosting to achieve more effective classifier training and concept-oriented video summarization and skimming
Recommendations
Semantic video classification by integrating unlabeled samples for classifier training
SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrievalSemantic video classification has become an active research topic to enable more effective video retrieval and knowledge discovery from large-scale video databases. However, most existing techniques for classifier training require a large number of hand-...
Concept-oriented video skimming via semantic video classification
MULTIMEDIA '04: Proceedings of the 12th annual ACM international conference on MultimediaEffective video skimming requires a good understanding of the semantics of video contents. However, more existing systems for content-based video retrieval (CBVR) can only support low-level video analysis, but they have limited effectiveness on ...
Semantic video classification and feature subset selection under context and concept uncertainty
JCDL '04: Proceedings of the 4th ACM/IEEE-CS joint conference on Digital librariesAs large collections of videos become one key component of digital libraries, there is an urgent need of semantic video classification and feature subset selection to enable more effective video database organization and retrieval. However, most ...






Comments