skip to main content
research-article

Audio keywords generation for sports video analysis

Published:16 May 2008Publication History
Skip Abstract Section

Abstract

Sports video has attracted a global viewership. Research effort in this area has been focused on semantic event detection in sports video to facilitate accessing and browsing. Most of the event detection methods in sports video are based on visual features. However, being a significant component of sports video, audio may also play an important role in semantic event detection. In this paper, we have borrowed the concept of the “keyword” from the text mining domain to define a set of specific audio sounds. These specific audio sounds refer to a set of game-specific sounds with strong relationships to the actions of players, referees, commentators, and audience, which are the reference points for interesting sports events. Unlike low-level features, audio keywords can be considered as a mid-level representation, able to facilitate high-level analysis from the semantic concept point of view. Audio keywords are created from low-level audio features with learning by support vector machines. With the help of video shots, the created audio keywords can be used to detect semantic events in sports video by Hidden Markov Model (HMM) learning. Experiments on creating audio keywords and, subsequently, event detection based on audio keywords have been very encouraging. Based on the experimental results, we believe that the audio keyword is an effective representation that is able to achieve satisfying results for event detection in sports video. Application in three sports types demonstrates the practicality of the proposed method.

References

  1. Ardizzo, E., Cascia, M. L., Gesu, V. D., and Valenti, C. 1996. Content-based indexing of image and video databases by global and shape features. In Proceedings of the 13th International Conference on Pattern Recognition. Vol. 3. 140--144. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Assfalg, J., Bertini, M., Bimbo, A. D., Nunziati, W., and Pala, P. 2002. Soccer highlights detection and recognition using HMMs. In Proceedings of the IEEE International Conference on Multimedia and Expo. 825--828.Google ScholarGoogle Scholar
  3. Baillie, M. and Jose, J. M. 2004. An audio-based sports video segmentation and event detection algorithm. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Chang, Y. L., Zeng, W., Kamel, I., and Alonso, R. 1996. Integrated image and speech analysis for content-based video indexing. In Proceedings of IEEE International Conference on Multimedia Systems and Computing. 306--313. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Deller, J. R., Hansen, J. H., and Proakis, J. G. 1999. Discrete-Time Processing of Speech Signals. Wiley-IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Duan, L., Xu, M., Tian, Q., Xu, C. S., and Jin, J. S. 2005. A unified framework for semantic shot classification in sports video. IEEE Trans. Multimedia 7, 6, 1066--1083. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Gong, Y.-H. and Liu, X. 2000. Video shot segmentation and classification. In Proceedings of the 15th International Conference on Pattern Recognition. Vol. 1. 860--863. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Hanjalic, A. 2002. Shot-boundary detection: unravelled and resolved. IEEE Trans. Circ. Syst. Video Techn. 12, 2, 90--105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Leonardi, R., Migliorati, P., and Prandini, M. 2004. Semantic indexing of soccer audio-visual sequences: A multimodal approach based on controlled markov chains. ACM Trans. Circ. Syst. Video Techn. 14, 5 (May), 634--643. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Liu, S., Xu, M., Yi, H., Chia, L., and Rajan, D. 2005. Multi-modal semantic analysis and annotation for basketball video. IEEE Trans. Multimedia 7, 6, 1066--1083.Google ScholarGoogle Scholar
  11. Miyauchi, S., Hirano, A., Babaguchi, N., and Kitahashi, T. 2002. Collaborative multimedia analysis for detecting semantical events from broadcasted sports video. In Proceedings of the 16th International Conference on Pattern Recognition. Vol. 2. 1009--1012.Google ScholarGoogle Scholar
  12. Nepal, S. 2001. Automatic detection of ‘goal’ segments in basketball videos. In Proceedings of ACM Multimedia. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Pan, H., Beek, P., and Sezan, M. I. 2001. Detection of slow-motion replay segments in sports video for highlights generation. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing. 1649--1652. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Rabiner, L. R. and Juang, B. H. 1993. Fundamentals of Speech Recognition. Prentice-Hall. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Rui, Y., Gupta, A., and Acero, A. 2000. Automatically extracting highlights for tv baseball programs. In Proceedings of ACM Multimedia Conference. 105--115. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Sadlier, D., Marlow, S., O'Connor, N., and Murphy, N. 2002. Mpeg audio bitstream processing towards the automatic generation of sports programme summaries. In Proceedings of IEEE International Conference on Multimedia and Expo (ICME). Vol. 2. 77--80.Google ScholarGoogle Scholar
  17. Sadlier, D. A. and O'Connor, N. E. 2005. Event detection in field sports video using audio-visual features and a support vector machine. ACM Trans. Circ. Syst. Video Techn. 15, 10 (Oct.), 1225--1233. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Sebe, N., Tian, Q., Loupias, E., Lew, M. S., and S.Huang, T. 2000. Colour indexing using wavelet-based salient points. In Proceedings of IEEE Workshop on Content-based Access of Image and Video Libraries. 15--19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S.F. Chang, W. C. and Sundaram, H. 1998. Semantic visual templates: Linking features to semantics. In Proceedings of IEEE International Conference on Image Processing (ICIP). Vol. 3. 531--535.Google ScholarGoogle Scholar
  20. Snoek, C. G. M. and Worring, M. 2003. Time interval maximum entropy based event indexing in soccer video. In Proceedings of IEEE International Conference on Multimedia and Expo. Vol. 3. 481--484. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Sundaram, H. and Chang, S.-F. 2000. Video scene segmentation using video and audio features. In Proceedings of IEEE International Conference on Multimedia and Expo (ICME). Vol. 2. 1145--1148.Google ScholarGoogle Scholar
  22. Vapnik, V. 1998. Statistical Learning Theory. Wiley.Google ScholarGoogle Scholar
  23. Wei, J., Li, Z.-N., and Gertner, I. 1999. A novel motion-based active video indexing method. In Proceedings of IEEE International Conference on Multimedia Computing and Systems. Vol. 2. 460--465. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Wu, C., Ma, Y., Zhang, H., and Zhong, Y. 2002. Event recognition by semantic inference for sports video. In Proceedings of IEEE International Conference on Multimedia and Expo (ICME). 805--808.Google ScholarGoogle Scholar
  25. Xie, L., Chang, S.-F., Divakaran, A., and Sun, H. 2004. Structure analysis of soccer video with domain knowledge and hidden markov models. Patt. Recog. Lett. 25, 767--775. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Xiong, Z., Radhakrishnan, R., and Divakaran, A. 2003. Generation of sports highlight using motion activities in combination with a common audio feature extraction framework. In Proceedings of IEEE International Conference on Image Processing (ICIP). Vol. 1. 14--17.Google ScholarGoogle Scholar
  27. Xiong, Z., Radhakrishnan, R., Divakaran, A., and Huang, T.-S. 2003. Audio events detection based highlights extraction from baseball, golf and soccer games in a unified framework. In IEEE International Conference on Acoustics, Speech, and Signal Processing. V--632--V--635. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Xu, M., Duan, L.-Y., Xu, C.-S., Kankanhalli, M., and Tian, Q. 2003. Event detection in basketball video using multiple modalities. In Proceedings of IEEE Pacific-Rim Conference on Multimedia (PCM). Vol. 3. 189--192.Google ScholarGoogle Scholar
  29. Xu, M., Duan, L.-Y., Xu, C.-S., and Tian, Q. 2003. A fusion scheme of visual and auditory modalities for event detection in sports video. In Proceedings of IEEE International Conference on Acoustic, Speech, and Signal Processing (ICASSP). Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Xu, M., Maddage, N. C., Xu, C.-S., Kankanhalli, M., and Tian, Q. 2003. Creating audio keywords for event detection in soccer video. In Proceedings of the International Conference on Multimedia and Expo (ICME). Vol. 2. 281--284. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Young, S. et al. 2002. The HTK Book (for HTK Version 3.1) http://htk.eng.cam.edu/. Cambridge University Engineering Department.Google ScholarGoogle Scholar
  32. Zhang, H. J., Smoliar, S. W., and Wu, J. H. 1995. Content-based video browsing tools. In Proceedings of the Storage and Retrieval for Image and Video Databases (SPIE). 389--398.Google ScholarGoogle Scholar
  33. Zhang, T. and Kuo, C. C. J. 2001. Content-Based Audio Classification and Retrieval for Audiovisual Data Parsing. Kluwer Academic Publishers. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Audio keywords generation for sports video analysis

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in

              Full Access

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader
              About Cookies On This Site

              We use cookies to ensure that we give you the best experience on our website.

              Learn more

              Got it!