skip to main content
research-article

Up-Fusion: An Evolving Multimedia Fusion Method

Published:04 September 2014Publication History
Skip Abstract Section

Abstract

The amount of multimedia data on the Internet has increased exponentially in the past few decades and this trend is likely to continue. Multimedia content inherently has multiple information sources, therefore effective fusion methods are critical for data analysis and understanding. So far, most of the existing fusion methods are static with respect to time, making it difficult for them to handle the evolving multimedia content. To address this issue, in recent years, several evolving fusion methods were proposed, however, their requirements are difficult to meet, making them useful only in limited applications. In this article, we propose a novel evolving fusion method based on the online portfolio selection theory. The proposed method takes into account the correlation among different information sources and evolves the fusion model when new multimedia data is added. It performs effectively on both crisp and soft decisions without requiring additional context information. Extensive experiments on concept detection and human detection tasks over the TRECVID dataset and surveillance data have been conducted and significantly better performance has been obtained.

References

  1. E. Acar, F. Hopfgartner, and S. Albayrak. 2013. Violence detection in hollywood movies by the fusion of visual and mid-level audio cues. In Proceedings of the 21st ACM International Conference on Multimedia. 717--720. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. P. K. Atrey, M. A. Hossain, A. E. Saddik, and M. S. Kankanhalli. 2010. Multimodal fusion for multimedia analysis: A survey. Multimedia Syst. 16, 6, 345--379. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. P. K. Atrey and A. E. Saddik. 2008. Confidence evolution in multimedia systems. IEEE Trans. Multimedia 10, 7, 1288--1298. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. E. Bellman. 1961. Adaptive Control Processes - A Guided Tour. Princeton University Press.Google ScholarGoogle Scholar
  5. X. Benavent, A. Garcia-Serrano, R. Granados, J. Benavent, and E. De Ves. 2013. Multimedia information retrieval based on late semantic fusion approaches: Experiments on a wikipedia image collection. IEEE Trans. Multimedia 15, 8, 2009--2021. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Blum and T. Mitchell. 1998. Combining labeled and unlabeled data with co-training. In Proceedings of the Annual Conference on Computational Learning Theory. 92--100. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C.-C. Chang and C.-J. Lin. 2001. LIBSVM: A library for support vector machines. http://www.csie.ntu.edu.tw/~cjlin/libsvm. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J.-G. Chen and N. Ansari. 1998. Adaptive fusion of correlated local decisions. IEEE Trans. Syst. Man, Cybernet. 28, 2, 276--281. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. K. Crammer, M. Dredze, and F. Pereira. 2008. Exact convex confidence-weighted learning. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 345--352.Google ScholarGoogle Scholar
  10. B. V. Dasarathy. 1994. Decision Fusion. Computer Society Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. X. Geng, K. Smith-Miles, L. Wang, M. Li, and Q. Wu. 2010. Context-aware fusion: A case study on fusion of gait and face for human identification in video. Pattern Recogn. 43, 10, 3660--3673. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. D. P. Helmbold, R. E. Schapire, Y. Singer, and M. K. Warmuth. 1998. On-line portfolio selection using multiplicative updates. Math. Finance 8, 4, 325--347.Google ScholarGoogle ScholarCross RefCross Ref
  13. J. M. Keller, P. D. Gader, and C. W. Caldwell. 1995. Principle of least commitment in the analysis of chromosome images. Appl. Fuzzy Logic Technol. II 2493, 1, 178--186.Google ScholarGoogle Scholar
  14. L. I. Kuncheva. 2004. Combining Pattern Classifiers: Methods and Algorithms. Wiley-Interscience. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J.-S. Lee and C. H. Park. 2008. Adaptive decision fusion for audio-visual speech recognition. In Speech Recognition, Technologies and Applications, InTech, 275--296.Google ScholarGoogle Scholar
  16. B. Li and S. C. Hoi. 2012. Online portfolio selection: A survey. ACM Comput. Surv. 46, 3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Li, Y. Zheng, S. Lin, Y.-D. Zhang, and T.-S. Chua. 2009. Multimedia evidence fusion for video concept detection via owa operator. In Proceedings of the 15th International Multimedia Modeling Conference on Advances in Multimedia Modeling. 208--216. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Ma, A. Kulesza, M. Dredze, K. Crammer, L. K. Saul, and F. Pereira. 2010. Exploiting feature covariance in high-dimensional online learning. In Proceedings of the International Conference on Artificial Intelligence and Statistics. 493--500.Google ScholarGoogle Scholar
  19. J. R. Movellan and P. Mineiro. 1998. Robust sensor fusion: Analysis and application to audio visual speech recognition. Mach. Learn. 32, 85--100. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. G. Myers, R. Nallapati, J. Hout, S. Pancoast, R. Nevatia, C. Sun, A. Habibian, D. Koelma, K. Sande, A. Smeulders, and C. Snoek. 2014. Evaluating multimedia features and fusion for example-based event detection. Mach. Vis. Appl. 25, 1, 17--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. N. Poh and S. Bengio. 2005. How do correlation and variance of base-experts affect fusion in biometric authentication tasks? IEEE Trans. Signal Process. 53, 11, 4384--4396. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. Sayedelahl, R. Araujo, and M. Kamel. 2013. Audio-visual feature-decision level fusion for spontaneous emotion estimation in speech conversations. In Proceedings of the IEEE International Conference on Multimedia and Expo Workshops (ICMEW'13). 1--6.Google ScholarGoogle Scholar
  23. A. F. Smeaton, P. Over, and W. Kraaij. 2009. High-level feature detection from video in trecvid: A 5-year retrospective of achievements. In Multimedia Content Analysis, Theory and Applications, Springer, 151--174.Google ScholarGoogle Scholar
  24. D. M. Tax, M. V. Breukelen, R. P. Duin, and J. Kittler. 2000. Combining multiple classifiers by averaging or by multiplying? Pattern Recogn. 33, 1475--1485.Google ScholarGoogle Scholar
  25. M. Wang, X.-S. Hua, X. Yuan, Y. Song, and L.-R. Dai. 2007. Optimizing multi-graph learning: Towards a unified video annotation scheme. In Proceedings of the ACM International Conference on Multimedia. 862--871. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. X. Wang and M. Kankanhalli. 2013. Multimedia fusion with mean-covariance analysis. IEEE Trans. Multimedia 15, 1, 120--128. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. X. Wang and M. S. Kankanhalli. 2010. Portfolio theory of multimedia fusion. In Proceedings of the ACM International Conference on Multimedia. 723--726. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. X. Wang, Y. Rui, and M. S. Kankanhalli. 2011. Up-fusion: An evolving multimedia decision fusion method. In Proceedings of the ACM International Conference on Multimedia. 1089--1092. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Y. Wu, E. Y. Chang, K. C.-C. Chang, and J. R. Smith. 2004. Optimal multimodal fusion for multimedia data analysis. In Proceedings of the ACM International Conference on Multimedia. 572--579. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. R. Yan and M. Naphade. 2005. Multi-modal video concept extraction using co-training. In Proceedings of the International Conference on Multimedia and Expo. 514--517.Google ScholarGoogle Scholar
  31. A. Yanagawa, S.-F. Chang, L. Kennedy, and W. Hsu. 2007. Columbia university's baseline detectors for 374 lscom semantic visual concepts. Tech. rep., 222-2006-8, Columbia University. http://www.ee.columbia. edu/ln/dvmm/columbia374/.Google ScholarGoogle Scholar
  32. A. Yanagawa, W. Hsu, and S.-F. Chang. 2006. Brief descriptions of visual features for baseline trecvid concept detectors. Tech. rep., Columbia University. http://www.ee.columbia.edu/ln/dvmm/publications/06/akira-baseline-tr.pdf.Google ScholarGoogle Scholar

Index Terms

  1. Up-Fusion: An Evolving Multimedia Fusion Method

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Multimedia Computing, Communications, and Applications
          ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 11, Issue 1
          August 2014
          151 pages
          ISSN:1551-6857
          EISSN:1551-6865
          DOI:10.1145/2665935
          Issue’s Table of Contents

          Copyright © 2014 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 4 September 2014
          • Accepted: 1 April 2014
          • Revised: 1 March 2014
          • Received: 1 October 2013
          Published in tomm Volume 11, Issue 1

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!