skip to main content
research-article

Affinity Derivation for Accurate Instance Segmentation

Authors Info & Claims
Published:16 April 2021Publication History
Skip Abstract Section

Abstract

Affinity, which represents whether two pixels belong to a same instance, is an equivalent representation to the instance segmentation labels. Conventional works do not make an explicit exploration on the affinity. In this article, we present two instance segmentation schemes based on pixel affinity information and show the effectiveness of affinity in both aspects. For proposal-free method, we predict pixel affinity for each image and then propose a simple yet effective graph merge algorithm to cluster pixels into instances. It shows that the affinity is powerful as an instance-relevant information to guide the clustering procedure in proposal-free instance segmentation. For proposal-based methods, we extend conventional framework with affinity head and introduce affinity as attached supervision in training phase. Without any additional inference cost, we can improve the performance of existing proposal-based instance segmentation methods, which shows that the affinity can also be applied as an auxiliary loss and training with such extra loss is beneficial to the training progress. Experimental results show that our schemes achieve comparable performance to other state-of-the-art instance segmentation methods. With Cityscapes training data, the proposed proposal-free method achieves 28.8 AP and the proposal-based method gets 27.2 AP both on test sets.

References

  1. Jiwoon Ahn and Suha Kwak. 2018. Learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 4981–4990.Google ScholarGoogle ScholarCross RefCross Ref
  2. A. Arnab and P. H. S. Torr. 2017. Pixelwise instance segmentation with a dynamically instantiated network. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 879–888.Google ScholarGoogle ScholarCross RefCross Ref
  3. M. Bai and R. Urtasun. 2017. Deep watershed transform for instance segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 2858–2866.Google ScholarGoogle Scholar
  4. B. D. Brabandere, D. Neven, and L. V. Gool. 2017. Semantic instance segmentation for autonomous driving. In IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’17). 478–480.Google ScholarGoogle Scholar
  5. L. C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and F. A. L. Yuille. 2018. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40, 4 (Apr. 2018), 834–848.Google ScholarGoogle ScholarCross RefCross Ref
  6. Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam. 2017. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 (2017).Google ScholarGoogle Scholar
  7. Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. 2018. Encoder-decoder with atrous separable convolution for semantic image segmentation. In European Conference on Computer Vision (ECCV’18). 801–818.Google ScholarGoogle ScholarCross RefCross Ref
  8. Xinlei Chen, Ross Girshick, Kaiming He, and Piotr Dollar. 2019. TensorMask: A foundation for dense object segmentation. In IEEE International Conference on Computer Vision (ICCV’19). 2061–2069.Google ScholarGoogle ScholarCross RefCross Ref
  9. M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele. 2016. The cityscapes dataset for semantic urban scene understanding. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 3213–3223.Google ScholarGoogle Scholar
  10. Jifeng Dai, Kaiming He, Yi Li, Shaoqing Ren, and Jian Sun. 2016. Instance-sensitive fully convolutional networks. In European Conference on Computer Vision (ECCV’16). 534–549.Google ScholarGoogle ScholarCross RefCross Ref
  11. Jifeng Dai, Kaiming He, and Jian Sun. 2015. Convolutional feature masking for joint object and stuff segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 3992–4000.Google ScholarGoogle ScholarCross RefCross Ref
  12. J. Dai, K. He, and J. Sun. 2016. Instance-aware semantic segmentation via multi-task network cascades. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 3150–3158.Google ScholarGoogle Scholar
  13. Jifeng Dai, Yi Li, Kaiming He, and Jian Sun. 2016. R-FCN: Object detection via region-based fully convolutional networks. In Advances in Neural Information Processing Systems (NeurlIPS’16). 379–387.Google ScholarGoogle Scholar
  14. J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, and Y. Wei. 2017. Deformable convolutional networks. In IEEE International Conference on Computer Vision (ICCV’17). 764–773.Google ScholarGoogle Scholar
  15. D. Erhan, C. Szegedy, A. Toshev, and D. Anguelov. 2014. Scalable object detection using deep neural networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). 2155–2162.Google ScholarGoogle Scholar
  16. Alireza Fathi, Zbigniew Wojna, Vivek Rathod, Peng Wang, Hyun Oh Song, Sergio Guadarrama, and Kevin P. Murphy. 2017. Semantic instance segmentation via deep metric learning. arXiv preprint arXiv:1703.10277 (2017).Google ScholarGoogle Scholar
  17. Jun Fu, Jing Liu, Yuhang Wang, and Hanqing Lu. 2017. Stacked deconvolutional network for semantic segmentation. arXiv preprint arXiv:1708.04943 (2017).Google ScholarGoogle Scholar
  18. Naiyu Gao, Yanhu Shan, Yupei Wang, Xin Zhao, Yinan Yu, Ming Yang, and Kaiqi Huang. 2019. SSAP: Single-shot instance segmentation with affinity pyramid. In IEEE International Conference on Computer Vision (ICCV’19). 642–651.Google ScholarGoogle ScholarCross RefCross Ref
  19. Alberto Garcia-Garcia, Sergio Orts-Escolano, Sergiu Oprea, Victor Villena-Martinez, and Jose Garcia-Rodriguez. 2017. A review on deep learning techniques applied to semantic segmentation. arXiv preprint arXiv:1704.06857 (2017).Google ScholarGoogle Scholar
  20. R. Girshick. 2015. Fast R-CNN. In IEEE International Conference on Computer Vision (ICCV’15). 1440–1448.Google ScholarGoogle Scholar
  21. R. Girshick, J. Donahue, T. Darrell, and J. Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). 580–587.Google ScholarGoogle Scholar
  22. Ross Girshick, Ilija Radosavovic, Georgia Gkioxari, Piotr Dollár, and Kaiming He. 2018. Detectron. Retrieved from https://github.com/facebookresearch/detectron.Google ScholarGoogle Scholar
  23. K. Grauman and T. Darrell. 2005. The pyramid match kernel: Discriminative classification with sets of image features. In IEEE International Conference on Computer Vision (ICCV), Vol. 2. 1458–1465.Google ScholarGoogle Scholar
  24. Bharath Hariharan, Pablo Arbeláez, Lubomir Bourdev, Subhransu Maji, and Jitendra Malik. 2011. Semantic contours from inverse detectors. In IEEE International Conference on Computer Vision (ICCV’11). IEEE, 991–998.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Bharath Hariharan, Pablo Arbeláez, Ross Girshick, and Jitendra Malik. 2014. Simultaneous detection and segmentation. In European Conference on Computer Vision (ECCV’14). 297–312.Google ScholarGoogle ScholarCross RefCross Ref
  26. Bharath Hariharan, Pablo Arbeláez, Ross Girshick, and Jitendra Malik. 2015. Hypercolumns for object segmentation and fine-grained localization. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 447–456.Google ScholarGoogle ScholarCross RefCross Ref
  27. Zeeshan Hayder, Xuming He, and Mathieu Salzmann. 2016. Shape-aware instance segmentation. arXiv preprint arXiv:1612.03129 (2016).Google ScholarGoogle Scholar
  28. Zeeshan Hayder, Xuming He, and Mathieu Salzmann. 2017. Boundary-aware instance segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 5696–5704.Google ScholarGoogle ScholarCross RefCross Ref
  29. K. He, G. Gkioxari, P. Dollár, and R. Girshick. 2017. Mask R-CNN. In IEEE International Conference on Computer Vision (ICCV’17). 2980–2988.Google ScholarGoogle Scholar
  30. K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 770–778.Google ScholarGoogle Scholar
  31. Yen-Chang Hsu, Zheng Xu, Zsolt Kira, and Jiawei Huang. 2018. Learning to cluster for proposal-free instance segmentation. In International Joint Conference on Neural Networks (IJCNN’18). 1–8.Google ScholarGoogle ScholarCross RefCross Ref
  32. J. Huang, V. Rathod, C. Sun, M. Zhu, A. Korattikara, A. Fathi, I. Fischer, Z. Wojna, Y. Song, S. Guadarrama, and K. Murphy. 2017. Speed/accuracy trade-offs for modern convolutional object detectors. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 3296–3297.Google ScholarGoogle Scholar
  33. M. A. Islam, M. Rochan, N. D. B. Bruce, and Y. Wang. 2017. Gated feedback refinement network for dense image labeling. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 4877–4885.Google ScholarGoogle Scholar
  34. Long Jin, Zeyu Chen, and Zhuowen Tu. 2016. Object detection free instance segmentation with labeling transformations. arXiv preprint arXiv:1611.08991 (2016).Google ScholarGoogle Scholar
  35. Tsung-Wei Ke, Jyh-Jing Hwang, Ziwei Liu, and Stella X. Yu. 2018. Adaptive affinity fields for semantic segmentation. In European Conference on Computer Vision (ECCV’18). 605–621.Google ScholarGoogle Scholar
  36. Alex Kendall, Yarin Gal, and Roberto Cipolla. 2017. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. arXiv preprint arXiv:1705.07115 3 (2017).Google ScholarGoogle Scholar
  37. Margret Keuper, Evgeny Levinkov, Nicolas Bonneel, Guillaume Lavoué, Thomas Brox, and Bjorn Andres. 2015. Efficient decomposition of image and mesh graphs by lifted multicuts. In IEEE International Conference on Computer Vision (ICCV’15). 1751–1759.Google ScholarGoogle ScholarCross RefCross Ref
  38. A. Kirillov, E. Levinkov, B. Andres, B. Savchynskyy, and C. Rother. 2017. InstanceCut: From edges to instances with multicut. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 7322–7331.Google ScholarGoogle Scholar
  39. S. Lazebnik, C. Schmid, and J. Ponce. 2006. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’06), Vol. 2. 2169–2178.Google ScholarGoogle Scholar
  40. Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. 1989. Backpropagation applied to handwritten zip code recognition. Neural Comput. 1, 4 (Dec. 1989), 541–551.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Evgeny Levinkov, Jonas Uhrig, Siyu Tang, Mohamed Omran, Eldar Insafutdinov, Alexander Kirillov, Carsten Rother, Thomas Brox, Bernt Schiele, and Bjoern Andres. 2017. Joint graph decomposition & node labeling: Problem, algorithms, applications. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17).Google ScholarGoogle ScholarCross RefCross Ref
  42. Y. Li, H. Qi, J. Dai, X. Ji, and Y. Wei. 2017. Fully convolutional instance-aware semantic segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 4438–4446.Google ScholarGoogle Scholar
  43. Zeming Li, Chao Peng, Gang Yu, Xiangyu Zhang, Yangdong Deng, and Jian Sun. 2018. DetNet: Design backbone for object detection. In European Conference on Computer Vision (ECCV’18). 334–350.Google ScholarGoogle ScholarCross RefCross Ref
  44. X. Liang, L. Lin, Y. Wei, X. Shen, J. Yang, and S. Yan. 2018. Proposal-free network for instance-level object segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 40, 12 (Dec. 2018), 2978–2991.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Xiaodan Liang, Yunchao Wei, Xiaohui Shen, Jianchao Yang, Liang Lin, and Shuicheng Yan. 2015. Proposal-free network for instance-level object segmentation. arXiv preprint arXiv:1509.02636 (2015).Google ScholarGoogle Scholar
  46. G. Lin, C. Shen, A. v. d. Hengel, and I. Reid. 2016. Efficient piecewise training of deep structured models for semantic segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 3194–3203.Google ScholarGoogle Scholar
  47. T. Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie. 2017. Feature pyramid networks for object detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 936–944.Google ScholarGoogle Scholar
  48. S. Liu, J. Jia, S. Fidler, and R. Urtasun. 2017. SGN: Sequential grouping networks for instance segmentation. In IEEE International Conference on Computer Vision (ICCV’17). 3516–3524.Google ScholarGoogle Scholar
  49. Shu Liu, Lu Qi, Haifang Qin, Jianping Shi, and Jiaya Jia. 2018. Path aggregation network for instance segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 8759–8768.Google ScholarGoogle ScholarCross RefCross Ref
  50. Shu Liu, Xiaojuan Qi, Jianping Shi, Hong Zhang, and Jiaya Jia. 2016. Multi-scale patch aggregation (MPA) for simultaneous detection and segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 3141–3149.Google ScholarGoogle ScholarCross RefCross Ref
  51. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C. Berg. 2016. SSD: Single shot multibox detector. In European Conference on Computer Vision (ECCV’16). 21–37.Google ScholarGoogle Scholar
  52. Wei Liu, Andrew Rabinovich, and Alexander C. Berg. 2015. ParseNet: Looking wider to see better. arXiv preprint arXiv:1506.04579 (2015).Google ScholarGoogle Scholar
  53. Z. Liu, X. Li, P. Luo, C. C. Loy, and X. Tang. 2015. Semantic image segmentation via deep parsing network. In IEEE International Conference on Computer Vision (ICCV’15). 1377–1385.Google ScholarGoogle Scholar
  54. Davy Neven, Bert De Brabandere, Marc Proesmans, and Luc Van Gool. 2019. Instance segmentation by jointly optimizing spatial embeddings and clustering bandwidth. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). 8837–8845.Google ScholarGoogle ScholarCross RefCross Ref
  55. Alejandro Newell, Kaiyu Yang, and Jia Deng. 2016. Stacked hourglass networks for human pose estimation. In European Conference on Computer Vision (ECCV’16), Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling (Eds.). Springer, Cham, 483–499.Google ScholarGoogle ScholarCross RefCross Ref
  56. Pedro O. Pinheiro, Ronan Collobert, and Piotr Dollár. 2015. Learning to segment object candidates. In Advances in Neural Information Processing Systems (NIPS’15). 1990–1998.Google ScholarGoogle Scholar
  57. J. Redmon, S. Divvala, R. Girshick, and A. Farhadi. 2016. You only look once: Unified, real-time object detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 779–788.Google ScholarGoogle Scholar
  58. S. Ren, K. He, R. Girshick, and J. Sun. 2017. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 6 (2017), 1137–1149.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 3 (01 Dec. 2015), 211–252.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. E. Shelhamer, J. Long, and T. Darrell. 2017. Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39, 4 (Apr. 2017), 640–651.Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Wei-Chih Tu, Ming-Yu Liu, Varun Jampani, Deqing Sun, Shao-Yi Chien, Ming-Hsuan Yang, and Jan Kautz. 2018. Learning superpixels with segmentation-aware affinity loss. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 568–576.Google ScholarGoogle ScholarCross RefCross Ref
  62. Jonas Uhrig, Marius Cordts, Uwe Franke, and Thomas Brox. 2016. Pixel-level encoding and depth layering for instance-level semantic labeling. In German Conference on Pattern Recognition (GCPR’16). 14–25.Google ScholarGoogle ScholarCross RefCross Ref
  63. Weiyue Wang, Ronald Yu, Qiangui Huang, and Ulrich Neumann. 2018. SGPN: Similarity group proposal network for 3D point cloud instance segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 2569–2578.Google ScholarGoogle ScholarCross RefCross Ref
  64. Yuwen Xiong, Renjie Liao, Hengshuang Zhao, Rui Hu, Min Bai, Ersin Yumer, and Raquel Urtasun. 2019. UPSNet: A unified panoptic segmentation network. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). 8818–8826.Google ScholarGoogle ScholarCross RefCross Ref
  65. Fisher Yu and Vladlen Koltun. 2015. Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 (2015).Google ScholarGoogle Scholar
  66. H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia. 2017. Pyramid scene parsing network. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 6230–6239.Google ScholarGoogle Scholar
  67. Yueqing Zhuang, Li Tao, Fan Yang, Cong Ma, Ziwei Zhang, Huizhu Jia, and Xiaodong Xie. 2018. RelationNet: Learning deep-aligned representation for semantic image segmentation. In International Conference on Pattern Recognition (ICPR’18). 1506–1511.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Affinity Derivation for Accurate Instance Segmentation

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Multimedia Computing, Communications, and Applications
        ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 17, Issue 1
        February 2021
        392 pages
        ISSN:1551-6857
        EISSN:1551-6865
        DOI:10.1145/3453992
        Issue’s Table of Contents

        Copyright © 2021 Association for Computing Machinery.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 16 April 2021
        • Accepted: 1 June 2020
        • Revised: 1 April 2020
        • Received: 1 October 2019
        Published in tomm Volume 17, Issue 1

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Refereed
      • Article Metrics

        • Downloads (Last 12 months)39
        • Downloads (Last 6 weeks)2

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!