Abstract
Sports video has attracted a global viewership. Research effort in this area has been focused on semantic event detection in sports video to facilitate accessing and browsing. Most of the event detection methods in sports video are based on visual features. However, being a significant component of sports video, audio may also play an important role in semantic event detection. In this paper, we have borrowed the concept of the “keyword” from the text mining domain to define a set of specific audio sounds. These specific audio sounds refer to a set of game-specific sounds with strong relationships to the actions of players, referees, commentators, and audience, which are the reference points for interesting sports events. Unlike low-level features, audio keywords can be considered as a mid-level representation, able to facilitate high-level analysis from the semantic concept point of view. Audio keywords are created from low-level audio features with learning by support vector machines. With the help of video shots, the created audio keywords can be used to detect semantic events in sports video by Hidden Markov Model (HMM) learning. Experiments on creating audio keywords and, subsequently, event detection based on audio keywords have been very encouraging. Based on the experimental results, we believe that the audio keyword is an effective representation that is able to achieve satisfying results for event detection in sports video. Application in three sports types demonstrates the practicality of the proposed method.
- Ardizzo, E., Cascia, M. L., Gesu, V. D., and Valenti, C. 1996. Content-based indexing of image and video databases by global and shape features. In Proceedings of the 13th International Conference on Pattern Recognition. Vol. 3. 140--144. Google Scholar
Digital Library
- Assfalg, J., Bertini, M., Bimbo, A. D., Nunziati, W., and Pala, P. 2002. Soccer highlights detection and recognition using HMMs. In Proceedings of the IEEE International Conference on Multimedia and Expo. 825--828.Google Scholar
- Baillie, M. and Jose, J. M. 2004. An audio-based sports video segmentation and event detection algorithm. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. Google Scholar
Digital Library
- Chang, Y. L., Zeng, W., Kamel, I., and Alonso, R. 1996. Integrated image and speech analysis for content-based video indexing. In Proceedings of IEEE International Conference on Multimedia Systems and Computing. 306--313. Google Scholar
Digital Library
- Deller, J. R., Hansen, J. H., and Proakis, J. G. 1999. Discrete-Time Processing of Speech Signals. Wiley-IEEE Computer Society. Google Scholar
Digital Library
- Duan, L., Xu, M., Tian, Q., Xu, C. S., and Jin, J. S. 2005. A unified framework for semantic shot classification in sports video. IEEE Trans. Multimedia 7, 6, 1066--1083. Google Scholar
Digital Library
- Gong, Y.-H. and Liu, X. 2000. Video shot segmentation and classification. In Proceedings of the 15th International Conference on Pattern Recognition. Vol. 1. 860--863. Google Scholar
Digital Library
- Hanjalic, A. 2002. Shot-boundary detection: unravelled and resolved. IEEE Trans. Circ. Syst. Video Techn. 12, 2, 90--105. Google Scholar
Digital Library
- Leonardi, R., Migliorati, P., and Prandini, M. 2004. Semantic indexing of soccer audio-visual sequences: A multimodal approach based on controlled markov chains. ACM Trans. Circ. Syst. Video Techn. 14, 5 (May), 634--643. Google Scholar
Digital Library
- Liu, S., Xu, M., Yi, H., Chia, L., and Rajan, D. 2005. Multi-modal semantic analysis and annotation for basketball video. IEEE Trans. Multimedia 7, 6, 1066--1083.Google Scholar
- Miyauchi, S., Hirano, A., Babaguchi, N., and Kitahashi, T. 2002. Collaborative multimedia analysis for detecting semantical events from broadcasted sports video. In Proceedings of the 16th International Conference on Pattern Recognition. Vol. 2. 1009--1012.Google Scholar
- Nepal, S. 2001. Automatic detection of ‘goal’ segments in basketball videos. In Proceedings of ACM Multimedia. Google Scholar
Digital Library
- Pan, H., Beek, P., and Sezan, M. I. 2001. Detection of slow-motion replay segments in sports video for highlights generation. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing. 1649--1652. Google Scholar
Digital Library
- Rabiner, L. R. and Juang, B. H. 1993. Fundamentals of Speech Recognition. Prentice-Hall. Google Scholar
Digital Library
- Rui, Y., Gupta, A., and Acero, A. 2000. Automatically extracting highlights for tv baseball programs. In Proceedings of ACM Multimedia Conference. 105--115. Google Scholar
Digital Library
- Sadlier, D., Marlow, S., O'Connor, N., and Murphy, N. 2002. Mpeg audio bitstream processing towards the automatic generation of sports programme summaries. In Proceedings of IEEE International Conference on Multimedia and Expo (ICME). Vol. 2. 77--80.Google Scholar
- Sadlier, D. A. and O'Connor, N. E. 2005. Event detection in field sports video using audio-visual features and a support vector machine. ACM Trans. Circ. Syst. Video Techn. 15, 10 (Oct.), 1225--1233. Google Scholar
Digital Library
- Sebe, N., Tian, Q., Loupias, E., Lew, M. S., and S.Huang, T. 2000. Colour indexing using wavelet-based salient points. In Proceedings of IEEE Workshop on Content-based Access of Image and Video Libraries. 15--19. Google Scholar
Digital Library
- S.F. Chang, W. C. and Sundaram, H. 1998. Semantic visual templates: Linking features to semantics. In Proceedings of IEEE International Conference on Image Processing (ICIP). Vol. 3. 531--535.Google Scholar
- Snoek, C. G. M. and Worring, M. 2003. Time interval maximum entropy based event indexing in soccer video. In Proceedings of IEEE International Conference on Multimedia and Expo. Vol. 3. 481--484. Google Scholar
Digital Library
- Sundaram, H. and Chang, S.-F. 2000. Video scene segmentation using video and audio features. In Proceedings of IEEE International Conference on Multimedia and Expo (ICME). Vol. 2. 1145--1148.Google Scholar
- Vapnik, V. 1998. Statistical Learning Theory. Wiley.Google Scholar
- Wei, J., Li, Z.-N., and Gertner, I. 1999. A novel motion-based active video indexing method. In Proceedings of IEEE International Conference on Multimedia Computing and Systems. Vol. 2. 460--465. Google Scholar
Digital Library
- Wu, C., Ma, Y., Zhang, H., and Zhong, Y. 2002. Event recognition by semantic inference for sports video. In Proceedings of IEEE International Conference on Multimedia and Expo (ICME). 805--808.Google Scholar
- Xie, L., Chang, S.-F., Divakaran, A., and Sun, H. 2004. Structure analysis of soccer video with domain knowledge and hidden markov models. Patt. Recog. Lett. 25, 767--775. Google Scholar
Digital Library
- Xiong, Z., Radhakrishnan, R., and Divakaran, A. 2003. Generation of sports highlight using motion activities in combination with a common audio feature extraction framework. In Proceedings of IEEE International Conference on Image Processing (ICIP). Vol. 1. 14--17.Google Scholar
- Xiong, Z., Radhakrishnan, R., Divakaran, A., and Huang, T.-S. 2003. Audio events detection based highlights extraction from baseball, golf and soccer games in a unified framework. In IEEE International Conference on Acoustics, Speech, and Signal Processing. V--632--V--635. Google Scholar
Digital Library
- Xu, M., Duan, L.-Y., Xu, C.-S., Kankanhalli, M., and Tian, Q. 2003. Event detection in basketball video using multiple modalities. In Proceedings of IEEE Pacific-Rim Conference on Multimedia (PCM). Vol. 3. 189--192.Google Scholar
- Xu, M., Duan, L.-Y., Xu, C.-S., and Tian, Q. 2003. A fusion scheme of visual and auditory modalities for event detection in sports video. In Proceedings of IEEE International Conference on Acoustic, Speech, and Signal Processing (ICASSP). Google Scholar
Digital Library
- Xu, M., Maddage, N. C., Xu, C.-S., Kankanhalli, M., and Tian, Q. 2003. Creating audio keywords for event detection in soccer video. In Proceedings of the International Conference on Multimedia and Expo (ICME). Vol. 2. 281--284. Google Scholar
Digital Library
- Young, S. et al. 2002. The HTK Book (for HTK Version 3.1) http://htk.eng.cam.edu/. Cambridge University Engineering Department.Google Scholar
- Zhang, H. J., Smoliar, S. W., and Wu, J. H. 1995. Content-based video browsing tools. In Proceedings of the Storage and Retrieval for Image and Video Databases (SPIE). 389--398.Google Scholar
- Zhang, T. and Kuo, C. C. J. 2001. Content-Based Audio Classification and Retrieval for Audiovisual Data Parsing. Kluwer Academic Publishers. Google Scholar
Digital Library
Index Terms
Audio keywords generation for sports video analysis
Recommendations
Audio keyword generation for sports video analysis
MULTIMEDIA '04: Proceedings of the 12th annual ACM international conference on MultimediaSemantic sports video analysis has attracted many research interests and audio cues have been shown to play an important role in semantics inference. To facilitate event detection using audio information, we have introduced the concept of audio keyword (...
Automatic generation of personalized music sports video
MULTIMEDIA '05: Proceedings of the 13th annual ACM international conference on MultimediaIn this paper, we propose a novel automatic approach for personalized music sports video generation. Two research challenges, semantic sports video content selection and automatic video composition, are addressed. For the first challenge, we propose to ...
Event tactic analysis based on broadcast sports video
Most existing approaches on sports video analysis have concentrated on semantic event detection. Sports professionals, however, are more interested in tactic analysis to help improve their performance. In this paper, we propose a novel approach to ...






Comments