skip to main content
research-article

Query-Guided Prototype Learning with Decoder Alignment and Dynamic Fusion in Few-Shot Segmentation

Published:15 March 2023Publication History
Skip Abstract Section

Abstract

Few-shot segmentation aims to segment objects belonging to a specific class under the guidance of a few annotated examples. Most existing approaches follow the prototype learning paradigm and generate category prototypes by squeezing masked feature maps extracted from images in the support set. These support prototypes may lead to inaccurate predictions when directly compared with features extracted from the query set due to the considerable distribution discrepancy between support and query features. We propose a query-guided prototype learning architecture to address this problem from two aspects: (i) We propose a cross-alignment loss for training the segmentation decoder. This loss function will help the decoder improve its robustness against the distribution discrepancy between support and query features. (ii) We build a dynamic fusion module to strengthen the original support prototype with another prototype extracted from query features. Experiments show that our method achieves promising results compared to previous prototype learning methods on PASCAL-5i and COCO-20i datasets.

REFERENCES

  1. [1] Boudiaf Malik, Kervadec Hoel, Masud Ziko Imtiaz, Piantanida Pablo, Ayed Ismail Ben, and Dolz Jose. 2021. Few-shot segmentation without meta-learning: A good transductive inference is all you need? In CVPR. 1397913988.Google ScholarGoogle Scholar
  2. [2] Chen Liang-Chieh, Papandreou George, Kokkinos Iasonas, Murphy Kevin, and Yuille Alan L.. 2017. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. TPAMI 40, 4 (2017), 834848.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Cordts Marius, Omran Mohamed, Ramos Sebastian, Rehfeld Timo, Enzweiler Markus, Benenson Rodrigo, Franke Uwe, Roth Stefan, and Schiele Bernt. 2016. The cityscapes dataset for semantic urban scene understanding. In CVPR. 32133223.Google ScholarGoogle Scholar
  4. [4] Dong Nanqing and Xing Eric P.. 2018. Few-shot semantic segmentation with prototype learning. In BMVC, Vol. 3.Google ScholarGoogle Scholar
  5. [5] Dosovitskiy Alexey, Beyer Lucas, Kolesnikov Alexander, Weissenborn Dirk, Zhai Xiaohua, Unterthiner Thomas, Dehghani Mostafa, Minderer Matthias, Heigold Georg, Gelly Sylvain, Uszkoreit Jakob, and Houlsby Neil. 2021. An image is worth 16 \(\times\) 16 words: Transformers for image recognition at scale. ICLR (2021).Google ScholarGoogle Scholar
  6. [6] Everingham Mark, Gool Luc Van, Williams Christopher K. I., Winn John, and Zisserman Andrew. 2010. The pascal visual object classes (VOC) challenge. IJCV 88, 2 (2010), 303338.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. [7] Fu Jun, Liu Jing, Tian Haijie, Li Yong, Bao Yongjun, Fang Zhiwei, and Lu Hanqing. 2019. Dual attention network for scene segmentation. In CVPR. 31463154.Google ScholarGoogle Scholar
  8. [8] Hariharan Bharath, Arbeláez Pablo, Girshick Ross, and Malik Jitendra. 2014. Simultaneous detection and segmentation. In ECCV. Springer, 297312.Google ScholarGoogle Scholar
  9. [9] He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2016. Deep residual learning for image recognition. In CVPR. 770778.Google ScholarGoogle Scholar
  10. [10] Huang Zilong, Wang Xinggang, Huang Lichao, Huang Chang, Wei Yunchao, and Liu Wenyu. 2019. CCNET: Criss-cross attention for semantic segmentation. In ICCV. 603612.Google ScholarGoogle Scholar
  11. [11] Li Gen, Jampani Varun, Sevilla-Lara Laura, Sun Deqing, Kim Jonghyun, and Kim Joongkyu. 2021. Adaptive prototype learning and allocation for few-shot segmentation. In CVPR. 83348343.Google ScholarGoogle Scholar
  12. [12] Lin Tsung-Yi, Maire Michael, Belongie Serge, Hays James, Perona Pietro, Ramanan Deva, Dollár Piotr, and Zitnick C. Lawrence. 2014. Microsoft COCO: Common objects in context. In ECCV. Springer, 740755.Google ScholarGoogle Scholar
  13. [13] Liu Binghao, Ding Yao, Jiao Jianbin, Ji Xiangyang, and Ye Qixiang. 2021. Anti-aliasing semantic reconstruction for few-shot semantic segmentation. In CVPR. 97479756.Google ScholarGoogle Scholar
  14. [14] Liu Weide, Zhang Chi, Lin Guosheng, and Liu Fayao. 2020. CRNet: Cross-reference networks for few-shot segmentation. In CVPR. 41654173.Google ScholarGoogle Scholar
  15. [15] Liu Yongfei, Zhang Xiangyi, Zhang Songyang, and He Xuming. 2020. Part-aware prototype network for few-shot semantic segmentation. In ECCV. Springer, 142158.Google ScholarGoogle Scholar
  16. [16] Long Jonathan, Shelhamer Evan, and Darrell Trevor. 2015. Fully convolutional networks for semantic segmentation. In CVPR. 34313440.Google ScholarGoogle Scholar
  17. [17] Nguyen Khoi and Todorovic Sinisa. 2019. Feature weighting and boosting for few-shot segmentation. In ICCV. 622631.Google ScholarGoogle Scholar
  18. [18] Rakelly Kate, Shelhamer Evan, Darrell Trevor, Efros Alyosha, and Levine Sergey. 2018. Conditional networks for few-shot semantic segmentation.Google ScholarGoogle Scholar
  19. [19] Ren Shaoqing, He Kaiming, Girshick Ross, and Sun Jian. 2015. Faster r-CNN: Towards real-time object detection with region proposal networks. In NeurIPS. 9199.Google ScholarGoogle Scholar
  20. [20] Ronneberger Olaf, Fischer Philipp, and Brox Thomas. 2015. U-net: Convolutional networks for biomedical image segmentation. In MICCAI. Springer, 234241.Google ScholarGoogle Scholar
  21. [21] Russakovsky Olga, Deng Jia, Su Hao, Krause Jonathan, Satheesh Sanjeev, Ma Sean, Huang Zhiheng, Karpathy Andrej, Khosla Aditya, Bernstein Michael, et al. 2015. Imagenet large scale visual recognition challenge. IJCV 115, 3 (2015), 211252.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Shaban A., Bansal S., Liu Z., Essa I., and Boots B.. 2017. One-shot learning for semantic segmentation. In BMVC.Google ScholarGoogle Scholar
  23. [23] Siam Mennatullah, Oreshkin Boris N., and Jagersand Martin. 2019. AMP: Adaptive masked proxies for few-shot segmentation. In ICCV. 52495258.Google ScholarGoogle Scholar
  24. [24] Snell Jake, Swersky Kevin, and Zemel Richard. 2017. Prototypical networks for few-shot learning. In NeurIPS. 40774087.Google ScholarGoogle Scholar
  25. [25] Tian Z., Zhao H., Shu M., Yang Z., and Jia J.. 2020. Prior guided feature enrichment network for few-shot segmentation. TPAMI PP, 99 (2020), 11.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N., Kaiser Łukasz, and Polosukhin Illia. 2017. Attention is all you need. In NeurIPS. 59986008.Google ScholarGoogle Scholar
  27. [27] Wang Haochen, Zhang Xudong, Hu Yutao, Yang Yandan, Cao Xianbin, and Zhen Xiantong. 2020. Few-shot semantic segmentation with democratic attention networks. In ECCV. Springer, 730746.Google ScholarGoogle Scholar
  28. [28] Wang Jingdong, Sun Ke, Cheng Tianheng, Jiang Borui, Deng Chaorui, Zhao Yang, Liu Dong, Mu Yadong, Tan Mingkui, Wang Xinggang, et al. 2020. Deep high-resolution representation learning for visual recognition. TPAMI 43, 10 (2020), 33493364.Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Wang Kaixin, Liew Jun Hao, Zou Yingtian, Zhou Daquan, and Feng Jiashi. 2019. PANET: Few-shot image semantic segmentation with prototype alignment. In ICCV. 91979206.Google ScholarGoogle Scholar
  30. [30] Wang Xiaolong, Girshick Ross, Gupta Abhinav, and He Kaiming. 2018. Non-local neural networks. In CVPR. 77947803.Google ScholarGoogle Scholar
  31. [31] Wei Xiu-Shen, Shen Yang, Sun Xuhao, Ye Han-Jia, and Yang Jian. 2021. A \(^2\)-Net: Learning attribute-aware hash codes for large-scale fine-grained image retrieval. Advances in Neural Information Processing Systems 34 (2021).Google ScholarGoogle Scholar
  32. [32] Wei Xiu-Shen, Wang Peng, Liu Lingqiao, Shen Chunhua, and Wu Jianxin. 2019. Piecewise classifier mappings: Learning fine-grained learners for novel categories with few examples. TIP 28, 12 (2019), 61166125.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. [33] Xie Guo-Sen, Liu Jie, Xiong Huan, and Shao Ling. 2021. Scale-aware graph neural network for few-shot semantic segmentation. In CVPR. 54755484.Google ScholarGoogle Scholar
  34. [34] Yang Boyu, Liu Chang, Li Bohao, Jiao Jianbin, and Ye Qixiang. 2020. Prototype mixture models for few-shot semantic segmentation. In ECCV. Springer, 763778.Google ScholarGoogle Scholar
  35. [35] Zhang Bingfeng, Xiao Jimin, and Qin Terry. 2021. Self-guided and cross-guided learning for few-shot segmentation. In CVPR. 83128321.Google ScholarGoogle Scholar
  36. [36] Zhang Chi, Lin Guosheng, Liu Fayao, Guo Jiushuang, Wu Qingyao, and Yao Rui. 2019. Pyramid graph networks with connection attentions for region-based one-shot semantic segmentation. In ICCV. 95879595.Google ScholarGoogle Scholar
  37. [37] Zhang Chi, Lin Guosheng, Liu Fayao, Yao Rui, and Shen Chunhua. 2019. CANET: Class-agnostic segmentation networks with iterative refinement and attentive few-shot learning. In CVPR. 52175226.Google ScholarGoogle Scholar
  38. [38] Zhang Xiaolin, Wei Yunchao, Yang Yi, and Huang Thomas S.. 2020. SG-one: Similarity guidance network for one-shot semantic segmentation. IEEE Transactions on Cybernetics (2020).Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Zhao Hengshuang, Shi Jianping, Qi Xiaojuan, Wang Xiaogang, and Jia Jiaya. 2017. Pyramid scene parsing network. In CVPR. 28812890.Google ScholarGoogle Scholar
  40. [40] Zheng Sixiao, Lu Jiachen, Zhao Hengshuang, Zhu Xiatian, Luo Zekun, Wang Yabiao, Fu Yanwei, Feng Jianfeng, Xiang Tao, Torr Philip H. S., et al. 2021. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In CVPR. 68816890.Google ScholarGoogle Scholar
  41. [41] Zhou Bolei, Zhao Hang, Puig Xavier, Fidler Sanja, Barriuso Adela, and Torralba Antonio. 2017. Scene parsing through ade20k dataset. In CVPR. 633641.Google ScholarGoogle Scholar

Index Terms

  1. Query-Guided Prototype Learning with Decoder Alignment and Dynamic Fusion in Few-Shot Segmentation

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 19, Issue 2s
      April 2023
      545 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/3572861
      • Editor:
      • Abdulmotaleb El Saddik
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 15 March 2023
      • Online AM: 12 August 2022
      • Accepted: 23 July 2022
      • Revised: 21 July 2022
      • Received: 28 December 2021
      Published in tomm Volume 19, Issue 2s

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
    • Article Metrics

      • Downloads (Last 12 months)160
      • Downloads (Last 6 weeks)17

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!