Abstract
In this study, we propose an effective and efficient algorithm for unconstrained video object segmentation, which is achieved in a Markov random field (MRF). In the MRF graph, each node is modeled as a superpixel and labeled as either foreground or background during the segmentation process. The unary potential is computed for each node by learning a transductive SVM classifier under supervision by a few labeled frames. The pairwise potential is used for the spatial-temporal smoothness. In addition, a high-order potential based on the multinomial event model is employed to enhance the appearance consistency throughout the frames. To minimize this intractable feature, we also introduce a more efficient technique that simply extends the original MRF structure. The proposed approach was evaluated in experiments with different measures and the results based on a benchmark demonstrated its effectiveness compared with other state-of-the-art algorithms.
- Sergi Caelles, Alberto Montes, Kevis-Kokitsi Maninis, Yuhua Chen, Luc Van Gool, Federico Perazzi, and Jordi Pont-Tuset. 2018. The 2018 DAVIS challenge on video object segmentation. Retrieved from arXiv:1803.00557.Google Scholar
- Yadang Chen, Chuanyan Hao, Alex X. Liu, and Enhua Wu. 2019. Multi-level model for video object segmentation based on supervision optimization. IEEE Trans. Multimedia 99 (2019), 1--1. Retrieved fromGoogle Scholar
Cross Ref
- Yadang Chen, Chuanyan Hao, and Enhua Wu. 2018. Efficient frame-sequential label propagation for video object segmentation. Multimedia Tools Appl. 77, 5 (2018), 6117--6133. Google Scholar
Digital Library
- A. Faktor and M. Irani.2014. Video segmentation by non-local consensus voting. In Proceedings of the British Machine Vision Conference.Google Scholar
- Daniela Giordano, Francesca Murabito, Simone Palazzo, and Concetto Spampinato. 2015. Superpixel-based video object segmentation using perceptual organization and location prior. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 4814--4822.Google Scholar
Cross Ref
- Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). Google Scholar
Digital Library
- Matthias Grundmann, Vivek Kwatra, Mei Han, and Irfan Essa. 2010. Efficient hierarchical graph-based video segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
Cross Ref
- K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 770--778.Google Scholar
- Fairouz Hussein and Massimo Piccardi. 2017. V-JAUNE: A framework for joint action recognition and video summarization. ACM Trans. Multimedia Comput. Commun. Appl. 13, 2, Article 20 (Apr. 2017). Google Scholar
Digital Library
- Suyog Dutt Jain and Kristen Grauman. 2014. Supervoxel-consistent foreground propagation in video. In Proceedings of the European Conference on Computer Vision: Part IV (Lecture Notes in Computer Science). Springer, 656--671.Google Scholar
Cross Ref
- V. Jampani, R. Gadde, and P. V. Gehler. 2017. Video propagation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
- W. D. Jang and C. S. Kim. 2017. Online video object segmentation via convolutional trident network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
- Won-Dong Jang, Chulwoo Lee, and Chang-Su Kim. 2016. Primary object segmentation in videos via alternate convex optimization of foreground and background distributions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
Cross Ref
- M. Keuper, B. Andres, and T. Brox. 2015. Motion trajectory segmentation via minimum cost multicuts. In Proceedings of the IEEE International Conference on Computer Vision. Google Scholar
Digital Library
- Johannes Kiess, Stephan Kopf, Benjamin Guthier, and Wolfgang Effelsberg. 2018. A survey on content-aware image and video retargeting. ACM Trans. Multimedia Comput. Commun. Appl. 14, 3, Article 76 (July 2018). Google Scholar
Digital Library
- Yeong Jun Koh and Chang-Su Kim. 2017. Primary object segmentation in videos based on region augmentation and reduction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
Cross Ref
- Pushmeet Kohli, L’Ubor Ladický, and Philip H. Torr. 2009. Robust higher order potentials for enforcing label consistency. Int. J. Comput. Vision 82, 3 (May 2009), 302--324. Google Scholar
Digital Library
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems. Curran Associates Inc., 1097--1105. Google Scholar
Digital Library
- Fuxin Li, Taeyoung Kim, Ahmad Humayun, David Tsai, and James M. Rehg. 2013. Video segmentation by tracking many figure-ground segments. Proceedings of the IEEE International Conference on Computer Vision. 2192--2199. Google Scholar
Digital Library
- Fei-Fei Li and Pietro Perona. 2005. A Bayesian hierarchical model for learning natural scene categories. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05). IEEE Computer Society, Washington, DC, 524--531. Google Scholar
Digital Library
- Nicolas Maerki, Federico Perazzi, Oliver Wang, and Alexander Sorkine-Hornung. 2016. Bilateral space video segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
Cross Ref
- Andrew McCallum and Kamal Nigam. 1998. A comparison of event models for Naive Bayes text classification. In Proceedings of the AAAI Workshop on Learning for Text Categorization.Google Scholar
- Peter Ochs, Jitendra Malik, and Thomas Brox. 2014. Segmentation of moving objects by long-term video analysis. IEEE Trans. Pattern Anal. Mach. Intell. 36, 6 (2014), 1187--1200. Google Scholar
Digital Library
- Zhaoqing Pan, Jianjun Lei, Yajuan Zhang, and Fu Lee Wang. 2018. Adaptive fractional-pixel motion estimation skipped algorithm for efficient HEVC motion estimation. ACM Trans. Multimedia Comput. Commun. Appl. 14, 1, Article 12 (Jan. 2018). Google Scholar
Digital Library
- Anestis Papazoglou and Vittorio Ferrari. 2013. Fast object segmentation in unconstrained video. In Proceedings of the IEEE International Conference on Computer Vision. Google Scholar
Digital Library
- F. Perazzi, A. Khoreva, R. Benenson, B. Schiele, and A. Sorkine-Hornung. 2017. Learning video object segmentation from static images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
- F. Perazzi, J. Pont-Tuset, B. McWilliams, L. Van Gool, M. Gross, and A. Sorkine-Hornung. 2016. A benchmark dataset and evaluation methodology for video object segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
- Federico Perazzi, Oliver Wang, Markus Gross, and Alexander Sorkine-Hornung. 2015. Fully connected object proposals for video segmentation. In Proceedings of the IEEE International Conference on Computer Vision. 3227--3234. Google Scholar
Digital Library
- Jordi Pont-Tuset, Federico Perazzi, Sergi Caelles, Pablo Arbeláez, Alexander Sorkine-Hornung, and Luc Van Gool. 2017. The 2017 DAVIS challenge on video object segmentation. Retrieved from arXiv:1704.00675.Google Scholar
- Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. (Apr. 2015). Retrieved from arXiv:1409.1556v6.Google Scholar
- David Tsai, Matthew Flagg, Atsushi Nakazawa, and James M. Rehg. 2012. Motion coherent tracking using multi-label MRF optimization. Int. J. Comput. Vision 100, 2 (Nov. 2012), 190--202. Google Scholar
Digital Library
- Yi-Hsuan Tsai, Ming-Hsuan Yang, and Michael J. Black. 2016. Video segmentation via object flow. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
- Lucas Pascotti Valem, Carlos Renan De Oliveira, Daniel Carlos Guimarães Pedronette, and Jurandy Almeida. 2018. Unsupervised similarity learning through rank correlation and kNN sets. ACM Trans. Multimedia Comput. Commun. Appl. 14, 4, Article 80 (Oct. 2018). Google Scholar
Digital Library
- Botao Wang, Zhihui Fu, Hongkai Xiong, and Yuan Zheng. 2017. Transductive video segmentation on tree-structured model. IEEE Trans. Circ. Syst. Video Technol. 27, 5 (2017), 992--1005. Google Scholar
Digital Library
- Wenguan Wang, Jianbing Shen, and Fatih Porikli. 2015. Saliency-aware geodesic video object segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 3395--3402.Google Scholar
Cross Ref
- Longyin Wen, Dawei Du, Zhen Lei, Stan Z. Li, and Ming-Hsuan Yang. 2015. JOTS: Joint online tracking and segmentation. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’15). IEEE Computer Society, 2226--2234.Google Scholar
Cross Ref
- Fanyi Xiao and Yong Jae Lee. 2016. Track and segment: An iterative unsupervised approach for video object proposals. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
Cross Ref
- Chenliang Xu and Jason J. Corso. 2016. LIBSVX: A supervoxel library and benchmark for early video processing. Int. J. Comput. Vision 119, 3 (Sept. 2016), 272--290. Google Scholar
Digital Library
- Zenglin Xu, Rong Jin, Jianke Zhu, Irwin King, and Michael R. Lyu. 2007. Efficient convex relaxation for transductive support vector machine. In Proceedings of the 20th International Conference on Neural Information Processing Systems (NIPS’07). Curran Associates, 1641--1648. Google Scholar
Digital Library
- Jiong Yang, Brian Price, Xiaohui Shen, Zhe Lin, and Junsong Yuan. 2016. Fast appearance modeling for automatic primary video object segmentation. IEEE Trans. Image Process. 25, 2 (2016), 503--515.Google Scholar
Digital Library
Index Terms
Appearance-consistent Video Object Segmentation Based on a Multinomial Event Model
Recommendations
A new localized superpixel Markov random field for image segmentation
ICME'09: Proceedings of the 2009 IEEE international conference on Multimedia and ExpoIn this paper, we present a novel localized Markov random field (MRF) method based on superpixels for region segmentation. Early vision problems could be formulated as pixel labeling using MRF. But the local interaction in MRF is limited to pixel label ...
Agentification of Markov model-based segmentation: Application to magnetic resonance brain scans
Objective: Markov random field (MRF) models have been traditionally applied to the task of robust-to-noise image segmentation. Most approaches estimate MRF parameters on the whole image via a global expectation-maximization (EM) procedure. The resulting ...
Factorial Markov Random Fields
ECCV '02: Proceedings of the 7th European Conference on Computer Vision-Part IIIIn this paper we propose an extension to the standard Markov Random Field (MRF) model in order to handle layers. Our extension, which we call a Factorial MRF (FMRF), is analogous to the extension from Hidden Markov Models (HMM's) to Factorial HMM's. We ...






Comments