Abstract
Affinity, which represents whether two pixels belong to a same instance, is an equivalent representation to the instance segmentation labels. Conventional works do not make an explicit exploration on the affinity. In this article, we present two instance segmentation schemes based on pixel affinity information and show the effectiveness of affinity in both aspects. For proposal-free method, we predict pixel affinity for each image and then propose a simple yet effective graph merge algorithm to cluster pixels into instances. It shows that the affinity is powerful as an instance-relevant information to guide the clustering procedure in proposal-free instance segmentation. For proposal-based methods, we extend conventional framework with affinity head and introduce affinity as attached supervision in training phase. Without any additional inference cost, we can improve the performance of existing proposal-based instance segmentation methods, which shows that the affinity can also be applied as an auxiliary loss and training with such extra loss is beneficial to the training progress. Experimental results show that our schemes achieve comparable performance to other state-of-the-art instance segmentation methods. With Cityscapes training data, the proposed proposal-free method achieves 28.8 AP and the proposal-based method gets 27.2 AP both on test sets.
- Jiwoon Ahn and Suha Kwak. 2018. Learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 4981–4990.Google Scholar
Cross Ref
- A. Arnab and P. H. S. Torr. 2017. Pixelwise instance segmentation with a dynamically instantiated network. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 879–888.Google Scholar
Cross Ref
- M. Bai and R. Urtasun. 2017. Deep watershed transform for instance segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 2858–2866.Google Scholar
- B. D. Brabandere, D. Neven, and L. V. Gool. 2017. Semantic instance segmentation for autonomous driving. In IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’17). 478–480.Google Scholar
- L. C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and F. A. L. Yuille. 2018. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40, 4 (Apr. 2018), 834–848.Google Scholar
Cross Ref
- Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam. 2017. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 (2017).Google Scholar
- Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. 2018. Encoder-decoder with atrous separable convolution for semantic image segmentation. In European Conference on Computer Vision (ECCV’18). 801–818.Google Scholar
Cross Ref
- Xinlei Chen, Ross Girshick, Kaiming He, and Piotr Dollar. 2019. TensorMask: A foundation for dense object segmentation. In IEEE International Conference on Computer Vision (ICCV’19). 2061–2069.Google Scholar
Cross Ref
- M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele. 2016. The cityscapes dataset for semantic urban scene understanding. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 3213–3223.Google Scholar
- Jifeng Dai, Kaiming He, Yi Li, Shaoqing Ren, and Jian Sun. 2016. Instance-sensitive fully convolutional networks. In European Conference on Computer Vision (ECCV’16). 534–549.Google Scholar
Cross Ref
- Jifeng Dai, Kaiming He, and Jian Sun. 2015. Convolutional feature masking for joint object and stuff segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 3992–4000.Google Scholar
Cross Ref
- J. Dai, K. He, and J. Sun. 2016. Instance-aware semantic segmentation via multi-task network cascades. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 3150–3158.Google Scholar
- Jifeng Dai, Yi Li, Kaiming He, and Jian Sun. 2016. R-FCN: Object detection via region-based fully convolutional networks. In Advances in Neural Information Processing Systems (NeurlIPS’16). 379–387.Google Scholar
- J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, and Y. Wei. 2017. Deformable convolutional networks. In IEEE International Conference on Computer Vision (ICCV’17). 764–773.Google Scholar
- D. Erhan, C. Szegedy, A. Toshev, and D. Anguelov. 2014. Scalable object detection using deep neural networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). 2155–2162.Google Scholar
- Alireza Fathi, Zbigniew Wojna, Vivek Rathod, Peng Wang, Hyun Oh Song, Sergio Guadarrama, and Kevin P. Murphy. 2017. Semantic instance segmentation via deep metric learning. arXiv preprint arXiv:1703.10277 (2017).Google Scholar
- Jun Fu, Jing Liu, Yuhang Wang, and Hanqing Lu. 2017. Stacked deconvolutional network for semantic segmentation. arXiv preprint arXiv:1708.04943 (2017).Google Scholar
- Naiyu Gao, Yanhu Shan, Yupei Wang, Xin Zhao, Yinan Yu, Ming Yang, and Kaiqi Huang. 2019. SSAP: Single-shot instance segmentation with affinity pyramid. In IEEE International Conference on Computer Vision (ICCV’19). 642–651.Google Scholar
Cross Ref
- Alberto Garcia-Garcia, Sergio Orts-Escolano, Sergiu Oprea, Victor Villena-Martinez, and Jose Garcia-Rodriguez. 2017. A review on deep learning techniques applied to semantic segmentation. arXiv preprint arXiv:1704.06857 (2017).Google Scholar
- R. Girshick. 2015. Fast R-CNN. In IEEE International Conference on Computer Vision (ICCV’15). 1440–1448.Google Scholar
- R. Girshick, J. Donahue, T. Darrell, and J. Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). 580–587.Google Scholar
- Ross Girshick, Ilija Radosavovic, Georgia Gkioxari, Piotr Dollár, and Kaiming He. 2018. Detectron. Retrieved from https://github.com/facebookresearch/detectron.Google Scholar
- K. Grauman and T. Darrell. 2005. The pyramid match kernel: Discriminative classification with sets of image features. In IEEE International Conference on Computer Vision (ICCV), Vol. 2. 1458–1465.Google Scholar
- Bharath Hariharan, Pablo Arbeláez, Lubomir Bourdev, Subhransu Maji, and Jitendra Malik. 2011. Semantic contours from inverse detectors. In IEEE International Conference on Computer Vision (ICCV’11). IEEE, 991–998.Google Scholar
Digital Library
- Bharath Hariharan, Pablo Arbeláez, Ross Girshick, and Jitendra Malik. 2014. Simultaneous detection and segmentation. In European Conference on Computer Vision (ECCV’14). 297–312.Google Scholar
Cross Ref
- Bharath Hariharan, Pablo Arbeláez, Ross Girshick, and Jitendra Malik. 2015. Hypercolumns for object segmentation and fine-grained localization. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 447–456.Google Scholar
Cross Ref
- Zeeshan Hayder, Xuming He, and Mathieu Salzmann. 2016. Shape-aware instance segmentation. arXiv preprint arXiv:1612.03129 (2016).Google Scholar
- Zeeshan Hayder, Xuming He, and Mathieu Salzmann. 2017. Boundary-aware instance segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 5696–5704.Google Scholar
Cross Ref
- K. He, G. Gkioxari, P. Dollár, and R. Girshick. 2017. Mask R-CNN. In IEEE International Conference on Computer Vision (ICCV’17). 2980–2988.Google Scholar
- K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 770–778.Google Scholar
- Yen-Chang Hsu, Zheng Xu, Zsolt Kira, and Jiawei Huang. 2018. Learning to cluster for proposal-free instance segmentation. In International Joint Conference on Neural Networks (IJCNN’18). 1–8.Google Scholar
Cross Ref
- J. Huang, V. Rathod, C. Sun, M. Zhu, A. Korattikara, A. Fathi, I. Fischer, Z. Wojna, Y. Song, S. Guadarrama, and K. Murphy. 2017. Speed/accuracy trade-offs for modern convolutional object detectors. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 3296–3297.Google Scholar
- M. A. Islam, M. Rochan, N. D. B. Bruce, and Y. Wang. 2017. Gated feedback refinement network for dense image labeling. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 4877–4885.Google Scholar
- Long Jin, Zeyu Chen, and Zhuowen Tu. 2016. Object detection free instance segmentation with labeling transformations. arXiv preprint arXiv:1611.08991 (2016).Google Scholar
- Tsung-Wei Ke, Jyh-Jing Hwang, Ziwei Liu, and Stella X. Yu. 2018. Adaptive affinity fields for semantic segmentation. In European Conference on Computer Vision (ECCV’18). 605–621.Google Scholar
- Alex Kendall, Yarin Gal, and Roberto Cipolla. 2017. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. arXiv preprint arXiv:1705.07115 3 (2017).Google Scholar
- Margret Keuper, Evgeny Levinkov, Nicolas Bonneel, Guillaume Lavoué, Thomas Brox, and Bjorn Andres. 2015. Efficient decomposition of image and mesh graphs by lifted multicuts. In IEEE International Conference on Computer Vision (ICCV’15). 1751–1759.Google Scholar
Cross Ref
- A. Kirillov, E. Levinkov, B. Andres, B. Savchynskyy, and C. Rother. 2017. InstanceCut: From edges to instances with multicut. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 7322–7331.Google Scholar
- S. Lazebnik, C. Schmid, and J. Ponce. 2006. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’06), Vol. 2. 2169–2178.Google Scholar
- Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. 1989. Backpropagation applied to handwritten zip code recognition. Neural Comput. 1, 4 (Dec. 1989), 541–551.Google Scholar
Digital Library
- Evgeny Levinkov, Jonas Uhrig, Siyu Tang, Mohamed Omran, Eldar Insafutdinov, Alexander Kirillov, Carsten Rother, Thomas Brox, Bernt Schiele, and Bjoern Andres. 2017. Joint graph decomposition & node labeling: Problem, algorithms, applications. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17).Google Scholar
Cross Ref
- Y. Li, H. Qi, J. Dai, X. Ji, and Y. Wei. 2017. Fully convolutional instance-aware semantic segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 4438–4446.Google Scholar
- Zeming Li, Chao Peng, Gang Yu, Xiangyu Zhang, Yangdong Deng, and Jian Sun. 2018. DetNet: Design backbone for object detection. In European Conference on Computer Vision (ECCV’18). 334–350.Google Scholar
Cross Ref
- X. Liang, L. Lin, Y. Wei, X. Shen, J. Yang, and S. Yan. 2018. Proposal-free network for instance-level object segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 40, 12 (Dec. 2018), 2978–2991.Google Scholar
Digital Library
- Xiaodan Liang, Yunchao Wei, Xiaohui Shen, Jianchao Yang, Liang Lin, and Shuicheng Yan. 2015. Proposal-free network for instance-level object segmentation. arXiv preprint arXiv:1509.02636 (2015).Google Scholar
- G. Lin, C. Shen, A. v. d. Hengel, and I. Reid. 2016. Efficient piecewise training of deep structured models for semantic segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 3194–3203.Google Scholar
- T. Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie. 2017. Feature pyramid networks for object detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 936–944.Google Scholar
- S. Liu, J. Jia, S. Fidler, and R. Urtasun. 2017. SGN: Sequential grouping networks for instance segmentation. In IEEE International Conference on Computer Vision (ICCV’17). 3516–3524.Google Scholar
- Shu Liu, Lu Qi, Haifang Qin, Jianping Shi, and Jiaya Jia. 2018. Path aggregation network for instance segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 8759–8768.Google Scholar
Cross Ref
- Shu Liu, Xiaojuan Qi, Jianping Shi, Hong Zhang, and Jiaya Jia. 2016. Multi-scale patch aggregation (MPA) for simultaneous detection and segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 3141–3149.Google Scholar
Cross Ref
- Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C. Berg. 2016. SSD: Single shot multibox detector. In European Conference on Computer Vision (ECCV’16). 21–37.Google Scholar
- Wei Liu, Andrew Rabinovich, and Alexander C. Berg. 2015. ParseNet: Looking wider to see better. arXiv preprint arXiv:1506.04579 (2015).Google Scholar
- Z. Liu, X. Li, P. Luo, C. C. Loy, and X. Tang. 2015. Semantic image segmentation via deep parsing network. In IEEE International Conference on Computer Vision (ICCV’15). 1377–1385.Google Scholar
- Davy Neven, Bert De Brabandere, Marc Proesmans, and Luc Van Gool. 2019. Instance segmentation by jointly optimizing spatial embeddings and clustering bandwidth. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). 8837–8845.Google Scholar
Cross Ref
- Alejandro Newell, Kaiyu Yang, and Jia Deng. 2016. Stacked hourglass networks for human pose estimation. In European Conference on Computer Vision (ECCV’16), Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling (Eds.). Springer, Cham, 483–499.Google Scholar
Cross Ref
- Pedro O. Pinheiro, Ronan Collobert, and Piotr Dollár. 2015. Learning to segment object candidates. In Advances in Neural Information Processing Systems (NIPS’15). 1990–1998.Google Scholar
- J. Redmon, S. Divvala, R. Girshick, and A. Farhadi. 2016. You only look once: Unified, real-time object detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 779–788.Google Scholar
- S. Ren, K. He, R. Girshick, and J. Sun. 2017. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 6 (2017), 1137–1149.Google Scholar
Digital Library
- Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 3 (01 Dec. 2015), 211–252.Google Scholar
Digital Library
- E. Shelhamer, J. Long, and T. Darrell. 2017. Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39, 4 (Apr. 2017), 640–651.Google Scholar
Digital Library
- Wei-Chih Tu, Ming-Yu Liu, Varun Jampani, Deqing Sun, Shao-Yi Chien, Ming-Hsuan Yang, and Jan Kautz. 2018. Learning superpixels with segmentation-aware affinity loss. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 568–576.Google Scholar
Cross Ref
- Jonas Uhrig, Marius Cordts, Uwe Franke, and Thomas Brox. 2016. Pixel-level encoding and depth layering for instance-level semantic labeling. In German Conference on Pattern Recognition (GCPR’16). 14–25.Google Scholar
Cross Ref
- Weiyue Wang, Ronald Yu, Qiangui Huang, and Ulrich Neumann. 2018. SGPN: Similarity group proposal network for 3D point cloud instance segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 2569–2578.Google Scholar
Cross Ref
- Yuwen Xiong, Renjie Liao, Hengshuang Zhao, Rui Hu, Min Bai, Ersin Yumer, and Raquel Urtasun. 2019. UPSNet: A unified panoptic segmentation network. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). 8818–8826.Google Scholar
Cross Ref
- Fisher Yu and Vladlen Koltun. 2015. Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 (2015).Google Scholar
- H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia. 2017. Pyramid scene parsing network. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 6230–6239.Google Scholar
- Yueqing Zhuang, Li Tao, Fan Yang, Cong Ma, Ziwei Zhang, Huizhu Jia, and Xiaodong Xie. 2018. RelationNet: Learning deep-aligned representation for semantic image segmentation. In International Conference on Pattern Recognition (ICPR’18). 1506–1511.Google Scholar
Cross Ref
Index Terms
Affinity Derivation for Accurate Instance Segmentation
Recommendations
Hybrid supervised instance segmentation by learning label noise suppression
AbstractTo reach top accuracy, current fully supervised instance segmentation methods severely rely on large-scale pixel-wise labeled datasets. They are usually expensive and time-consuming to obtain. Though weakly or semi-supervised methods ...
Affinity Derivation and Graph Merge for Instance Segmentation
Computer Vision – ECCV 2018AbstractWe present an instance segmentation scheme based on pixel affinity information, which is the relationship of two pixels belonging to the same instance. In our scheme, we use two neural networks with similar structures. One predicts the pixel level ...
ChaInNet: Deep Chain Instance Segmentation Network for Panoptic Segmentation
AbstractWe consider the competition between instance and semantic segmentation in panoptic segmentation to develop the deep chain instance segmentation network (ChaInNet) to mitigate this problem. Segmentation competition is caused by the usual ...






Comments