skip to main content
research-article

A Weakly Supervised Semantic Segmentation Network by Aggregating Seed Cues: The Multi-Object Proposal Generation Perspective

Authors Info & Claims
Published:31 March 2021Publication History
Skip Abstract Section

Abstract

Weakly supervised semantic segmentation under image-level annotations is effectiveness for real-world applications. The small and sparse discriminative regions obtained from an image classification network that are typically used as the important initial location of semantic segmentation also form the bottleneck. Although deep convolutional neural networks (DCNNs) have exhibited promising performances for single-label image classification tasks, images of the real-world usually contain multiple categories, which is still an open problem. So, the problem of obtaining high-confidence discriminative regions from multi-label classification networks remains unsolved. To solve this problem, this article proposes an innovative three-step framework within the perspective of multi-object proposal generation. First, an image is divided into candidate boxes using the object proposal method. The candidate boxes are sent to a single-classification network to obtain the discriminative regions. Second, the discriminative regions are aggregated to obtain a high-confidence seed map. Third, the seed cues grow on the feature maps of high-level semantics produced by a backbone segmentation network. Experiments are carried out on the PASCAL VOC 2012 dataset to verify the effectiveness of our approach, which is shown to outperform other baseline image segmentation methods.

References

  1. Jun Yu, Jing Li, Zhou Yu, and Qingming Huang. 2019. Multimodal transformer with multi-view visual representation for image captioning. IEEE Transactions on Circuits and Systems for Video Technology. https://doi.org/10.1109/TCSVT.2019.2947482Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Jun Yu, Min Tan, Hongyuan Zhang, Dacheng Tao, and Yong Rui. 2019. Hierarchical deep click feature prediction for fine-grained image recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence. https://doi.org/10.1109/TPAMI.2019.2932058Google ScholarGoogle Scholar
  3. J. Long, E. Shelhamer, and T. Darrell. 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR’15). 3431–3440.Google ScholarGoogle Scholar
  4. P. Henbuhl and V. Koltun. 2011. Efficient inference in fully connected CRFs with Gaussian edge potentials. In Proceedings of the International Conference on Neural Information Processing Systems (NIPS’11). 109–117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. L. C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille. 2014. Semantic image segmentation with deep convolutional nets and fully connected CRFs. arXiv preprint arXiv:1412.7062.Google ScholarGoogle Scholar
  6. B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba. 2016. Learning deep features for discriminative localization. In Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR’16). 2921–2929.Google ScholarGoogle Scholar
  7. R. Adams and L. Bisch. 1994. Seeded region growing. IEEE Transactions on Pattern Analysis and Machine Intelligence 16, 6 (1994), 641–647. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Yunchao Gong et al. 2013. Deep convolutional ranking for multilabel image annotation. arXiv preprint arXiv:1312.4894.Google ScholarGoogle Scholar
  9. C. W. Lee, W. Fang, C. K. Yeh, et al. 2018. Multi-label zero-shot learning with structured knowledge graphs. In Proceedings of the International Conference on Vision and Pattern Recognition (CVPR’18). 1576–1587.Google ScholarGoogle ScholarCross RefCross Ref
  10. B. Jin, M. V. O. Segovia, and S. Susstrunk. 2017. Webly supervised semantic segmentation. In Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR’17). 1705–1714.Google ScholarGoogle Scholar
  11. T. Shen, G. Lin, C. Shen, and R. Ian. 2018. Bootstrapping the performance of Webly supervised semantic segmentation. In Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR’18). 1363–1371.Google ScholarGoogle Scholar
  12. Y. Wei, W. Xia, M. Lin, et al. 2015. HCP: A flexible CNN framework for multi-label image classification. IEEE Transactions on Software Engineering 38, 9 (2015), 1901–1907.Google ScholarGoogle Scholar
  13. Y. Wei, X. Liang, Y. Chen, et al. 2016. Learning to segment with image-level annotation. IEEE Transactions on Pattern Recognition 59 (2016), 234–244. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Jun Yu, Zhenzhong Kuang, Baopeng Zhang, Wei Zhang, Dan Lin, and Jianping Fan. 2018. Leveraging content sensitiveness and user trustworthiness to recommend fine-grained privacy settings for social image sharing. IEEE Transactions on Information Forensics and Security 13, 5 (2018), 1317–1332. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Jun Yu, Chaoqun Hong, Yong Rui, and Dacheng Tao. 2018. Multitask autoencoder model for recovering human poses. IEEE Transactions on Industrial Electronics 65, 6 (2018), 5060–5068.Google ScholarGoogle ScholarCross RefCross Ref
  16. M. Everingham, S. A. Eslami, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman. 2015. The Pascal visual object classes challenge: A retrospective. International Journal of Computer Vision1 11, 1 (2015), 98–136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Z. H. Zhou. 2018. A brief introduction to weakly supervised learning. National Science Review 5, 1 (2015), 48–57.Google ScholarGoogle ScholarCross RefCross Ref
  18. J. R. R. Uijlings and K. E. A. van de Sande. 2013. Selective search for object recognition. International Journal of Computer Vision 104, 2 (2013), 154–171. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Dai, K. He, and J. Sun. 2015. BoxSup: Exploiting bounding boxes to supervise convolutional networks for semantic segmentation. In Proceedings of the International Conference on Computer Vision and Pattern Recognition. 1635–1643. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. D. Lin, J. Dai, J. Jia, K. He, and J. Sun. 2016. ScribbleSup: Scribble-supervised convolutional networks for semantic segmentation. In Proceedings of the International Conference on Computer Vision and Pattern Recognition. 3159–3167.Google ScholarGoogle Scholar
  21. A. Bearman, O. Russakovsky, V. Ferrari, and L. Fei-Fei. 2016. What's the point: Semantic segmentation with point supervision. In Proceedings of the International Conference on ECCV. 549–565.Google ScholarGoogle Scholar
  22. G. Papandreou, L.-C. Chen, K. Murphy, and A. L. Yuille. 2015. Weakly-and semi-supervised learning of a DCNN for semantic image segmentation. arXiv preprint arXiv:1502.02734. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Xu, A. G. Schwing, and R. Urtasun. 2015. Learning to segment under various forms of weak supervision. In Proceedings of the International Conference on Computer Vision and Pattern Recognition. 3781–3790.Google ScholarGoogle Scholar
  24. S. Wan, Y. Zhao, T. Wang, Z. Gu, Q. H. Abbasi, and K. K. R. Choo. 2019. Multi-dimensional data indexing and range query processing via Voronoi diagram for internet of things. Future Generation Computer Systems 91 (2019), 382–391.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Z. Huang, X. Wang, J. Wang, W. Liu, and J. Wang. 2018. Weakly supervised semantic segmentation network with deep seeded region growing. In Proceedings of the International Conference on Computer Vision and Pattern Recognition. 7104–7023.Google ScholarGoogle Scholar
  26. A. Kolesnikov and C. H. Lampert. 2016. Seed, expand and constrain: Three principles for weakly-supervised image segmentation. In Proceedings of the International Conference on ECCV. 695–711.Google ScholarGoogle Scholar
  27. X. Wang, S. You, X. Li, et al. 2018. Weakly-supervised semantic segmentation by iteratively mining common object features. In Proceedings of the International Conference on Computer Vision and Pattern Recognition. 1354–1362.Google ScholarGoogle ScholarCross RefCross Ref
  28. Y. Wei, J. Feng, X. Liang, M.-M. Cheng, Y. Zhao, and S. Yan. 2017. Object region mining with adversarial erasing: A simple classification to semantic segmentation approach. In Proceedings of the International Conference on Computer Vision and Pattern Recognition. 1354–1362.Google ScholarGoogle Scholar
  29. Q. Hou, P. T. Jiang, Y. Wei, et al. 2018. Self-erasing network for integral object attention. In Proceedings of the International Conference on Computer Vision and Pattern Recognition. 549–559.Google ScholarGoogle Scholar
  30. K. van de Sande, J. Uijlings, T. Gevers, and A. Smeulders. 2011. Segmentation as selective search for object recognition. In Proceedings of the International Conference on International Journal of Computer Vision. 1879–1886. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. P. Arbelaez, J. Pont-Tuset, J. Barron, F. Marques, and J. Malik. 2014. Multiscale combinatorial grouping. In Proceedings of the International Conference on Computer Vision and Pattern Recognition. 328–335. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. M.-M. Cheng, Z. Zhang, W.-Y. Lin, and P. H. S. Torr. 2014. BING: Binarized normed gradients for objectness estimation at 300 fps. In Proceedings of the International Conference on Computer Vision and Pattern Recognition. 3286–3293. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. C. Zitnick and P. Dollar. 2014. Edge boxes: Locating object proposals from edges. In Proceedings of the International Conference on the 13th European Conference on Computer VisIon. 391–405.Google ScholarGoogle Scholar
  34. Wenjing Gao, Yonghua Zhu, Wenjun Zhang, Ke Zhang, and Honghao Gao. 2019. A hierarchical recurrent approach to predict scene graphs from a visual-attention-oriented perspective. Computational Intelligence (COIN) 35, 3 (2019), 496–516.Google ScholarGoogle ScholarCross RefCross Ref
  35. V. Borges, M. C. F. de Oliveira, T. Silva, A. Vieira, and B. Hamann. 2018. Region growing for segmenting green microalgae images. IEEE Transactions on Computational Biology and Bioinformatics 15, 1 (2018), 257–270. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. S. Wan, Y. Xia, L Qi, Y. H. Yang, and M. Atiquzzaman. 2020. Automated colorization of a grayscale image with seed points propagation. IEEE Transactions on Multimedia 22, 7 (2020), 1756–1768.Google ScholarGoogle ScholarCross RefCross Ref
  37. J. Shi and J. Malik. 2000. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 8 (2000), 888–905. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. X. Qi, Z. Liu, J. Shi, H. Zhao, and J. Jia. 2016. Augmented feedback in semantic segmentation under image level supervision. In Proceedings of the International Conference on ECCV. 90–105.Google ScholarGoogle Scholar
  39. K. Simonyan and A. Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.Google ScholarGoogle Scholar
  40. L. C. Chen, G. Papandreou, I. Kokkinos et al. 2016. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 4 (2016), 834–848.Google ScholarGoogle ScholarCross RefCross Ref
  41. H. Jiang, J. Wang, Z. Yuan, Y. Wu, N. Zheng, and S. Li. 2013. Salient object detection: A discriminative regional feature integration approach. In Proceedings of the International Conference on Computer Vision and Pattern Recognition. 2083–2090. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Y. Xi, Y. Zhang, S. Ding, and S. Wan. 2020. Visual question answering model based on visual relationship detection. Signal Processing: Image Communication. 80 (2020), 115648. https://doi.org/10.1016/j.image.2019.115648Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. S. Ding, S. Qu, Y. Xi, and S. Wan. 2019. Stimulus-driven and concept-driven analysis for image caption generation. Neurocomputing. 398 (2019), 520–530. https://doi.org/10.1016/j.neucom.2019.04.095Google ScholarGoogle ScholarCross RefCross Ref
  44. Jintai Chen, Haochao Ying, Xuechen Liu, Jingjing Gu, Ruiwei Feng, Tingting Chen, Honghao Gao, and Jian Wu. 2020. A transfer learning based super-resolution microscopy for biopsy slice images: The joint methods perspective. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB). 99 (2020), 1–1. https://doi.org/10.1109/TCBB.2020.2991173Google ScholarGoogle Scholar
  45. D. Pathak, P. Krahenbuhl, and T. Darrell. 2015. Constrained convolutional neural networks for weakly supervised segmentation. In Proceedings of the International Conference on ICCV. 1796–1804. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. G. Papandreou, L.-C. Chen, K. Murphy, and A. L. Yuille. 2015. Weakly-and semi-supervised learning of a DCNN for semantic image segmentation. In Proceedings of the International Conference on ICCV. 1742–1750. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Y. Wei, X. Liang, Y. Chen, X. Shen, M.-M. Cheng, J. Feng, Y. Zhao, and S. Yan. 2016. STC: A simple to complex framework for weakly-supervised semantic segmentation. IEEE Transactions on Pattern AnalYsis and Machine Intelligence 39, 11 (2016), 2314–2320.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. W. Shimoda and K. Yanai. 2016. Distinct class-specific saliency maps for weakly supervised semantic segmentation. In Proceedings of the International Conference on ECCV. 218–234.Google ScholarGoogle Scholar
  49. A. Roy and S. Todorovic. 2017. Combining bottom-up, top-down, and smoothness cues for weakly supervised image segmentation. In Proceedings of the International Conference on Computer Vision and Pattern Recognition. 3529–3538.Google ScholarGoogle Scholar

Index Terms

  1. A Weakly Supervised Semantic Segmentation Network by Aggregating Seed Cues: The Multi-Object Proposal Generation Perspective

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!