Abstract
Weakly supervised semantic segmentation under image-level annotations is effectiveness for real-world applications. The small and sparse discriminative regions obtained from an image classification network that are typically used as the important initial location of semantic segmentation also form the bottleneck. Although deep convolutional neural networks (DCNNs) have exhibited promising performances for single-label image classification tasks, images of the real-world usually contain multiple categories, which is still an open problem. So, the problem of obtaining high-confidence discriminative regions from multi-label classification networks remains unsolved. To solve this problem, this article proposes an innovative three-step framework within the perspective of multi-object proposal generation. First, an image is divided into candidate boxes using the object proposal method. The candidate boxes are sent to a single-classification network to obtain the discriminative regions. Second, the discriminative regions are aggregated to obtain a high-confidence seed map. Third, the seed cues grow on the feature maps of high-level semantics produced by a backbone segmentation network. Experiments are carried out on the PASCAL VOC 2012 dataset to verify the effectiveness of our approach, which is shown to outperform other baseline image segmentation methods.
- Jun Yu, Jing Li, Zhou Yu, and Qingming Huang. 2019. Multimodal transformer with multi-view visual representation for image captioning. IEEE Transactions on Circuits and Systems for Video Technology. https://doi.org/10.1109/TCSVT.2019.2947482Google Scholar
Digital Library
- Jun Yu, Min Tan, Hongyuan Zhang, Dacheng Tao, and Yong Rui. 2019. Hierarchical deep click feature prediction for fine-grained image recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence. https://doi.org/10.1109/TPAMI.2019.2932058Google Scholar
- J. Long, E. Shelhamer, and T. Darrell. 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR’15). 3431–3440.Google Scholar
- P. Henbuhl and V. Koltun. 2011. Efficient inference in fully connected CRFs with Gaussian edge potentials. In Proceedings of the International Conference on Neural Information Processing Systems (NIPS’11). 109–117. Google Scholar
Digital Library
- L. C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille. 2014. Semantic image segmentation with deep convolutional nets and fully connected CRFs. arXiv preprint arXiv:1412.7062.Google Scholar
- B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba. 2016. Learning deep features for discriminative localization. In Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR’16). 2921–2929.Google Scholar
- R. Adams and L. Bisch. 1994. Seeded region growing. IEEE Transactions on Pattern Analysis and Machine Intelligence 16, 6 (1994), 641–647. Google Scholar
Digital Library
- Yunchao Gong et al. 2013. Deep convolutional ranking for multilabel image annotation. arXiv preprint arXiv:1312.4894.Google Scholar
- C. W. Lee, W. Fang, C. K. Yeh, et al. 2018. Multi-label zero-shot learning with structured knowledge graphs. In Proceedings of the International Conference on Vision and Pattern Recognition (CVPR’18). 1576–1587.Google Scholar
Cross Ref
- B. Jin, M. V. O. Segovia, and S. Susstrunk. 2017. Webly supervised semantic segmentation. In Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR’17). 1705–1714.Google Scholar
- T. Shen, G. Lin, C. Shen, and R. Ian. 2018. Bootstrapping the performance of Webly supervised semantic segmentation. In Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR’18). 1363–1371.Google Scholar
- Y. Wei, W. Xia, M. Lin, et al. 2015. HCP: A flexible CNN framework for multi-label image classification. IEEE Transactions on Software Engineering 38, 9 (2015), 1901–1907.Google Scholar
- Y. Wei, X. Liang, Y. Chen, et al. 2016. Learning to segment with image-level annotation. IEEE Transactions on Pattern Recognition 59 (2016), 234–244. Google Scholar
Digital Library
- Jun Yu, Zhenzhong Kuang, Baopeng Zhang, Wei Zhang, Dan Lin, and Jianping Fan. 2018. Leveraging content sensitiveness and user trustworthiness to recommend fine-grained privacy settings for social image sharing. IEEE Transactions on Information Forensics and Security 13, 5 (2018), 1317–1332. Google Scholar
Digital Library
- Jun Yu, Chaoqun Hong, Yong Rui, and Dacheng Tao. 2018. Multitask autoencoder model for recovering human poses. IEEE Transactions on Industrial Electronics 65, 6 (2018), 5060–5068.Google Scholar
Cross Ref
- M. Everingham, S. A. Eslami, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman. 2015. The Pascal visual object classes challenge: A retrospective. International Journal of Computer Vision1 11, 1 (2015), 98–136. Google Scholar
Digital Library
- Z. H. Zhou. 2018. A brief introduction to weakly supervised learning. National Science Review 5, 1 (2015), 48–57.Google Scholar
Cross Ref
- J. R. R. Uijlings and K. E. A. van de Sande. 2013. Selective search for object recognition. International Journal of Computer Vision 104, 2 (2013), 154–171. Google Scholar
Digital Library
- J. Dai, K. He, and J. Sun. 2015. BoxSup: Exploiting bounding boxes to supervise convolutional networks for semantic segmentation. In Proceedings of the International Conference on Computer Vision and Pattern Recognition. 1635–1643. Google Scholar
Digital Library
- D. Lin, J. Dai, J. Jia, K. He, and J. Sun. 2016. ScribbleSup: Scribble-supervised convolutional networks for semantic segmentation. In Proceedings of the International Conference on Computer Vision and Pattern Recognition. 3159–3167.Google Scholar
- A. Bearman, O. Russakovsky, V. Ferrari, and L. Fei-Fei. 2016. What's the point: Semantic segmentation with point supervision. In Proceedings of the International Conference on ECCV. 549–565.Google Scholar
- G. Papandreou, L.-C. Chen, K. Murphy, and A. L. Yuille. 2015. Weakly-and semi-supervised learning of a DCNN for semantic image segmentation. arXiv preprint arXiv:1502.02734. Google Scholar
Digital Library
- J. Xu, A. G. Schwing, and R. Urtasun. 2015. Learning to segment under various forms of weak supervision. In Proceedings of the International Conference on Computer Vision and Pattern Recognition. 3781–3790.Google Scholar
- S. Wan, Y. Zhao, T. Wang, Z. Gu, Q. H. Abbasi, and K. K. R. Choo. 2019. Multi-dimensional data indexing and range query processing via Voronoi diagram for internet of things. Future Generation Computer Systems 91 (2019), 382–391.Google Scholar
Digital Library
- Z. Huang, X. Wang, J. Wang, W. Liu, and J. Wang. 2018. Weakly supervised semantic segmentation network with deep seeded region growing. In Proceedings of the International Conference on Computer Vision and Pattern Recognition. 7104–7023.Google Scholar
- A. Kolesnikov and C. H. Lampert. 2016. Seed, expand and constrain: Three principles for weakly-supervised image segmentation. In Proceedings of the International Conference on ECCV. 695–711.Google Scholar
- X. Wang, S. You, X. Li, et al. 2018. Weakly-supervised semantic segmentation by iteratively mining common object features. In Proceedings of the International Conference on Computer Vision and Pattern Recognition. 1354–1362.Google Scholar
Cross Ref
- Y. Wei, J. Feng, X. Liang, M.-M. Cheng, Y. Zhao, and S. Yan. 2017. Object region mining with adversarial erasing: A simple classification to semantic segmentation approach. In Proceedings of the International Conference on Computer Vision and Pattern Recognition. 1354–1362.Google Scholar
- Q. Hou, P. T. Jiang, Y. Wei, et al. 2018. Self-erasing network for integral object attention. In Proceedings of the International Conference on Computer Vision and Pattern Recognition. 549–559.Google Scholar
- K. van de Sande, J. Uijlings, T. Gevers, and A. Smeulders. 2011. Segmentation as selective search for object recognition. In Proceedings of the International Conference on International Journal of Computer Vision. 1879–1886. Google Scholar
Digital Library
- P. Arbelaez, J. Pont-Tuset, J. Barron, F. Marques, and J. Malik. 2014. Multiscale combinatorial grouping. In Proceedings of the International Conference on Computer Vision and Pattern Recognition. 328–335. Google Scholar
Digital Library
- M.-M. Cheng, Z. Zhang, W.-Y. Lin, and P. H. S. Torr. 2014. BING: Binarized normed gradients for objectness estimation at 300 fps. In Proceedings of the International Conference on Computer Vision and Pattern Recognition. 3286–3293. Google Scholar
Digital Library
- C. Zitnick and P. Dollar. 2014. Edge boxes: Locating object proposals from edges. In Proceedings of the International Conference on the 13th European Conference on Computer VisIon. 391–405.Google Scholar
- Wenjing Gao, Yonghua Zhu, Wenjun Zhang, Ke Zhang, and Honghao Gao. 2019. A hierarchical recurrent approach to predict scene graphs from a visual-attention-oriented perspective. Computational Intelligence (COIN) 35, 3 (2019), 496–516.Google Scholar
Cross Ref
- V. Borges, M. C. F. de Oliveira, T. Silva, A. Vieira, and B. Hamann. 2018. Region growing for segmenting green microalgae images. IEEE Transactions on Computational Biology and Bioinformatics 15, 1 (2018), 257–270. Google Scholar
Digital Library
- S. Wan, Y. Xia, L Qi, Y. H. Yang, and M. Atiquzzaman. 2020. Automated colorization of a grayscale image with seed points propagation. IEEE Transactions on Multimedia 22, 7 (2020), 1756–1768.Google Scholar
Cross Ref
- J. Shi and J. Malik. 2000. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 8 (2000), 888–905. Google Scholar
Digital Library
- X. Qi, Z. Liu, J. Shi, H. Zhao, and J. Jia. 2016. Augmented feedback in semantic segmentation under image level supervision. In Proceedings of the International Conference on ECCV. 90–105.Google Scholar
- K. Simonyan and A. Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.Google Scholar
- L. C. Chen, G. Papandreou, I. Kokkinos et al. 2016. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 4 (2016), 834–848.Google Scholar
Cross Ref
- H. Jiang, J. Wang, Z. Yuan, Y. Wu, N. Zheng, and S. Li. 2013. Salient object detection: A discriminative regional feature integration approach. In Proceedings of the International Conference on Computer Vision and Pattern Recognition. 2083–2090. Google Scholar
Digital Library
- Y. Xi, Y. Zhang, S. Ding, and S. Wan. 2020. Visual question answering model based on visual relationship detection. Signal Processing: Image Communication. 80 (2020), 115648. https://doi.org/10.1016/j.image.2019.115648Google Scholar
Digital Library
- S. Ding, S. Qu, Y. Xi, and S. Wan. 2019. Stimulus-driven and concept-driven analysis for image caption generation. Neurocomputing. 398 (2019), 520–530. https://doi.org/10.1016/j.neucom.2019.04.095Google Scholar
Cross Ref
- Jintai Chen, Haochao Ying, Xuechen Liu, Jingjing Gu, Ruiwei Feng, Tingting Chen, Honghao Gao, and Jian Wu. 2020. A transfer learning based super-resolution microscopy for biopsy slice images: The joint methods perspective. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB). 99 (2020), 1–1. https://doi.org/10.1109/TCBB.2020.2991173Google Scholar
- D. Pathak, P. Krahenbuhl, and T. Darrell. 2015. Constrained convolutional neural networks for weakly supervised segmentation. In Proceedings of the International Conference on ICCV. 1796–1804. Google Scholar
Digital Library
- G. Papandreou, L.-C. Chen, K. Murphy, and A. L. Yuille. 2015. Weakly-and semi-supervised learning of a DCNN for semantic image segmentation. In Proceedings of the International Conference on ICCV. 1742–1750. Google Scholar
Digital Library
- Y. Wei, X. Liang, Y. Chen, X. Shen, M.-M. Cheng, J. Feng, Y. Zhao, and S. Yan. 2016. STC: A simple to complex framework for weakly-supervised semantic segmentation. IEEE Transactions on Pattern AnalYsis and Machine Intelligence 39, 11 (2016), 2314–2320.Google Scholar
Digital Library
- W. Shimoda and K. Yanai. 2016. Distinct class-specific saliency maps for weakly supervised semantic segmentation. In Proceedings of the International Conference on ECCV. 218–234.Google Scholar
- A. Roy and S. Todorovic. 2017. Combining bottom-up, top-down, and smoothness cues for weakly supervised image segmentation. In Proceedings of the International Conference on Computer Vision and Pattern Recognition. 3529–3538.Google Scholar
Index Terms
A Weakly Supervised Semantic Segmentation Network by Aggregating Seed Cues: The Multi-Object Proposal Generation Perspective
Recommendations
Semi- and Weakly- Supervised Semantic Segmentation with Deep Convolutional Neural Networks
MM '15: Proceedings of the 23rd ACM international conference on MultimediaSuccessful semantic segmentation methods typically rely on the training datasets containing a large number of pixel-wise labeled images. To alleviate the dependence on such a fully annotated training dataset, in this paper, we propose a semi- and weakly-...
Region-Guided Pixel-Level Label Generation for Weakly Supervised Semantic Segmentation
ICCCV '21: Proceedings of the 4th International Conference on Control and Computer VisionThe lack of reliable segmentation labels is the major obstacles to weakly supervised semantic segmentation. We provide a pseudo-label generation approach based on a deep convolutional neural network, which is supervised by the image-level category ...
Adversarial Decoupling for Weakly Supervised Semantic Segmentation
Pattern Recognition and Computer VisionAbstractImage semantic segmentation has been widely used in medical image analysis, autonomous driving and other fields. However, the fully-supervised semantic segmentation network requires a lot of labor cost to label pixel-level training data, so weakly ...






Comments