skip to main content
research-article
Public Access

Appearance-consistent Video Object Segmentation Based on a Multinomial Event Model

Authors Info & Claims
Published:05 June 2019Publication History
Skip Abstract Section

Abstract

In this study, we propose an effective and efficient algorithm for unconstrained video object segmentation, which is achieved in a Markov random field (MRF). In the MRF graph, each node is modeled as a superpixel and labeled as either foreground or background during the segmentation process. The unary potential is computed for each node by learning a transductive SVM classifier under supervision by a few labeled frames. The pairwise potential is used for the spatial-temporal smoothness. In addition, a high-order potential based on the multinomial event model is employed to enhance the appearance consistency throughout the frames. To minimize this intractable feature, we also introduce a more efficient technique that simply extends the original MRF structure. The proposed approach was evaluated in experiments with different measures and the results based on a benchmark demonstrated its effectiveness compared with other state-of-the-art algorithms.

References

  1. Sergi Caelles, Alberto Montes, Kevis-Kokitsi Maninis, Yuhua Chen, Luc Van Gool, Federico Perazzi, and Jordi Pont-Tuset. 2018. The 2018 DAVIS challenge on video object segmentation. Retrieved from arXiv:1803.00557.Google ScholarGoogle Scholar
  2. Yadang Chen, Chuanyan Hao, Alex X. Liu, and Enhua Wu. 2019. Multi-level model for video object segmentation based on supervision optimization. IEEE Trans. Multimedia 99 (2019), 1--1. Retrieved fromGoogle ScholarGoogle ScholarCross RefCross Ref
  3. Yadang Chen, Chuanyan Hao, and Enhua Wu. 2018. Efficient frame-sequential label propagation for video object segmentation. Multimedia Tools Appl. 77, 5 (2018), 6117--6133. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Faktor and M. Irani.2014. Video segmentation by non-local consensus voting. In Proceedings of the British Machine Vision Conference.Google ScholarGoogle Scholar
  5. Daniela Giordano, Francesca Murabito, Simone Palazzo, and Concetto Spampinato. 2015. Superpixel-based video object segmentation using perceptual organization and location prior. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 4814--4822.Google ScholarGoogle ScholarCross RefCross Ref
  6. Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Matthias Grundmann, Vivek Kwatra, Mei Han, and Irfan Essa. 2010. Efficient hierarchical graph-based video segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle ScholarCross RefCross Ref
  8. K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 770--778.Google ScholarGoogle Scholar
  9. Fairouz Hussein and Massimo Piccardi. 2017. V-JAUNE: A framework for joint action recognition and video summarization. ACM Trans. Multimedia Comput. Commun. Appl. 13, 2, Article 20 (Apr. 2017). Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Suyog Dutt Jain and Kristen Grauman. 2014. Supervoxel-consistent foreground propagation in video. In Proceedings of the European Conference on Computer Vision: Part IV (Lecture Notes in Computer Science). Springer, 656--671.Google ScholarGoogle ScholarCross RefCross Ref
  11. V. Jampani, R. Gadde, and P. V. Gehler. 2017. Video propagation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle Scholar
  12. W. D. Jang and C. S. Kim. 2017. Online video object segmentation via convolutional trident network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle Scholar
  13. Won-Dong Jang, Chulwoo Lee, and Chang-Su Kim. 2016. Primary object segmentation in videos via alternate convex optimization of foreground and background distributions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle ScholarCross RefCross Ref
  14. M. Keuper, B. Andres, and T. Brox. 2015. Motion trajectory segmentation via minimum cost multicuts. In Proceedings of the IEEE International Conference on Computer Vision. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Johannes Kiess, Stephan Kopf, Benjamin Guthier, and Wolfgang Effelsberg. 2018. A survey on content-aware image and video retargeting. ACM Trans. Multimedia Comput. Commun. Appl. 14, 3, Article 76 (July 2018). Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Yeong Jun Koh and Chang-Su Kim. 2017. Primary object segmentation in videos based on region augmentation and reduction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle ScholarCross RefCross Ref
  17. Pushmeet Kohli, L’Ubor Ladický, and Philip H. Torr. 2009. Robust higher order potentials for enforcing label consistency. Int. J. Comput. Vision 82, 3 (May 2009), 302--324. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems. Curran Associates Inc., 1097--1105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Fuxin Li, Taeyoung Kim, Ahmad Humayun, David Tsai, and James M. Rehg. 2013. Video segmentation by tracking many figure-ground segments. Proceedings of the IEEE International Conference on Computer Vision. 2192--2199. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Fei-Fei Li and Pietro Perona. 2005. A Bayesian hierarchical model for learning natural scene categories. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05). IEEE Computer Society, Washington, DC, 524--531. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Nicolas Maerki, Federico Perazzi, Oliver Wang, and Alexander Sorkine-Hornung. 2016. Bilateral space video segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle ScholarCross RefCross Ref
  22. Andrew McCallum and Kamal Nigam. 1998. A comparison of event models for Naive Bayes text classification. In Proceedings of the AAAI Workshop on Learning for Text Categorization.Google ScholarGoogle Scholar
  23. Peter Ochs, Jitendra Malik, and Thomas Brox. 2014. Segmentation of moving objects by long-term video analysis. IEEE Trans. Pattern Anal. Mach. Intell. 36, 6 (2014), 1187--1200. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Zhaoqing Pan, Jianjun Lei, Yajuan Zhang, and Fu Lee Wang. 2018. Adaptive fractional-pixel motion estimation skipped algorithm for efficient HEVC motion estimation. ACM Trans. Multimedia Comput. Commun. Appl. 14, 1, Article 12 (Jan. 2018). Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Anestis Papazoglou and Vittorio Ferrari. 2013. Fast object segmentation in unconstrained video. In Proceedings of the IEEE International Conference on Computer Vision. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. F. Perazzi, A. Khoreva, R. Benenson, B. Schiele, and A. Sorkine-Hornung. 2017. Learning video object segmentation from static images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle Scholar
  27. F. Perazzi, J. Pont-Tuset, B. McWilliams, L. Van Gool, M. Gross, and A. Sorkine-Hornung. 2016. A benchmark dataset and evaluation methodology for video object segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle Scholar
  28. Federico Perazzi, Oliver Wang, Markus Gross, and Alexander Sorkine-Hornung. 2015. Fully connected object proposals for video segmentation. In Proceedings of the IEEE International Conference on Computer Vision. 3227--3234. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Jordi Pont-Tuset, Federico Perazzi, Sergi Caelles, Pablo Arbeláez, Alexander Sorkine-Hornung, and Luc Van Gool. 2017. The 2017 DAVIS challenge on video object segmentation. Retrieved from arXiv:1704.00675.Google ScholarGoogle Scholar
  30. Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. (Apr. 2015). Retrieved from arXiv:1409.1556v6.Google ScholarGoogle Scholar
  31. David Tsai, Matthew Flagg, Atsushi Nakazawa, and James M. Rehg. 2012. Motion coherent tracking using multi-label MRF optimization. Int. J. Comput. Vision 100, 2 (Nov. 2012), 190--202. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Yi-Hsuan Tsai, Ming-Hsuan Yang, and Michael J. Black. 2016. Video segmentation via object flow. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle Scholar
  33. Lucas Pascotti Valem, Carlos Renan De Oliveira, Daniel Carlos Guimarães Pedronette, and Jurandy Almeida. 2018. Unsupervised similarity learning through rank correlation and kNN sets. ACM Trans. Multimedia Comput. Commun. Appl. 14, 4, Article 80 (Oct. 2018). Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Botao Wang, Zhihui Fu, Hongkai Xiong, and Yuan Zheng. 2017. Transductive video segmentation on tree-structured model. IEEE Trans. Circ. Syst. Video Technol. 27, 5 (2017), 992--1005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Wenguan Wang, Jianbing Shen, and Fatih Porikli. 2015. Saliency-aware geodesic video object segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 3395--3402.Google ScholarGoogle ScholarCross RefCross Ref
  36. Longyin Wen, Dawei Du, Zhen Lei, Stan Z. Li, and Ming-Hsuan Yang. 2015. JOTS: Joint online tracking and segmentation. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’15). IEEE Computer Society, 2226--2234.Google ScholarGoogle ScholarCross RefCross Ref
  37. Fanyi Xiao and Yong Jae Lee. 2016. Track and segment: An iterative unsupervised approach for video object proposals. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle ScholarCross RefCross Ref
  38. Chenliang Xu and Jason J. Corso. 2016. LIBSVX: A supervoxel library and benchmark for early video processing. Int. J. Comput. Vision 119, 3 (Sept. 2016), 272--290. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Zenglin Xu, Rong Jin, Jianke Zhu, Irwin King, and Michael R. Lyu. 2007. Efficient convex relaxation for transductive support vector machine. In Proceedings of the 20th International Conference on Neural Information Processing Systems (NIPS’07). Curran Associates, 1641--1648. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Jiong Yang, Brian Price, Xiaohui Shen, Zhe Lin, and Junsong Yuan. 2016. Fast appearance modeling for automatic primary video object segmentation. IEEE Trans. Image Process. 25, 2 (2016), 503--515.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Appearance-consistent Video Object Segmentation Based on a Multinomial Event Model

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Multimedia Computing, Communications, and Applications
          ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 15, Issue 2
          May 2019
          375 pages
          ISSN:1551-6857
          EISSN:1551-6865
          DOI:10.1145/3339884
          Issue’s Table of Contents

          Copyright © 2019 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 5 June 2019
          • Accepted: 1 February 2019
          • Revised: 1 January 2019
          • Received: 1 November 2018
          Published in tomm Volume 15, Issue 2

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!