skip to main content
research-article

Hierarchical and Progressive Image Matting

Published:06 February 2023Publication History
Skip Abstract Section

Abstract

Most matting research resorts to advanced semantics to achieve high-quality alpha mattes, and a direct low-level features combination is usually explored to complement alpha details. However, we argue that appearance-agnostic integration can only provide biased foreground (FG) details and that alpha mattes require different-level feature aggregation for better pixel-wise opacity perception. In this article, we propose an end-to-end hierarchical and progressive attention matting network (HAttMatting++), which can better predict the opacity of the FG from single RGB images without additional input. Specifically, we utilize channel-wise attention (CA) to distill pyramidal features and employ spatial attention (SA) at different levels to filter appearance cues. This progressive attention mechanism can estimate alpha mattes from adaptive semantics and semantics-indicated boundaries. We also introduce a hybrid loss function fusing structural similarity, mean square error, adversarial loss, and sentry supervision to guide the network to further improve the overall FG structure. In addition, we construct a large-scale and challenging image matting dataset comprised of 59,000 training images and 1,000 test images (a total of 646 distinct FG alpha mattes), which can further improve the robustness of our hierarchical and progressive aggregation model. Extensive experiments demonstrate that the proposed HAttMatting++ can capture sophisticated FG structures and achieve state-of-the-art performance with single RGB images as input.

REFERENCES

  1. [1] Aksoy Yagiz, Aydin Tunc Ozan, and Pollefeys Marc. 2017. Designing effective inter-pixel information flow for natural image matting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’17). 228236.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Aksoy Yağız, Oh Tae-Hyun, Paris Sylvain, Pollefeys Marc, and Matusik Wojciech. 2018. Semantic soft segmentation. ACM Transactions on Graphics 37, 4 (2018), Article 72.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. [3] Cai Shaofan, Zhang Xiaoshuai, Fan Haoqiang, Huang Haibin, Liu Jiangyu, Liu Jiaming, Liu Jiaying, Wang Jue, and Sun Jian. 2019. Disentangled image matting. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV’19). 88188827.Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Chen Long, Zhang Hanwang, Xiao Jun, Nie Liqiang, Shao Jian, Liu Wei, and Chua Tat-Seng. 2017. SCA-CNN: Spatial and channel-wise attention in convolutional networks for image captioning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’17). 62986306.Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Chen L. C., Papandreou G., Kokkinos I., Murphy K., and Yuille A. L.. 2018. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 4 (2018), 834848.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Chen Quan, Ge Tiezheng, Xu Yanyu, Zhang Zhiqiang, Yang Xinxin, and Gai Kun. 2018. Semantic human matting. In Proceedings of the ACM International Conference on Multimedia (MM’18). 618626.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. [7] Chen Qifeng, Li Dingzeyu, and Tang Chi Keung. 2013. KNN matting. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 9 (2013), 21752188.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Cho D., Kim S., Tai Y. W., and Kweon I. S.. 2016. Automatic trimap generation and consistent matting for light-field images. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 8 (2016), 15041517.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Cho Donghyeon, Tai Yu-Wing, and Kweon In So. 2019. Deep convolutional neural network for natural image matting using initial alpha mattes. IEEE Transactions on Image Processing 28, 3 (2019), 10541067.Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Dai Yutong, Lu Hao, and Shen Chunhua. 2021. Learning affinity-aware upsampling for deep image matting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’21). 68416850.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Everingham Mark, Gool Luc Van, Williams Christopher K. I., Winn John, and Zisserman Andrew. 2010. The PASCAL Visual Object Classes (VOC) challenge. International Journal of Computer Vision 88, 2 (2010), 303338.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Gastal Eduardo S. L. and Oliveira Manuel M.. 2010. Shared sampling for real-time alpha matting. Computer Graphics Forum 29, 2 (2010), 575584.Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Goodfellow Ian J., Pouget-Abadie Jean, Mirza Mehdi, Bing Xu, Warde-Farley David, Ozair Sherjil, Courville Aaron, and Bengio Yoshua. 2014. Generative adversarial nets. In Proceedings of the International Conference on Neural Information Processing Systems (NeurIPS’14). 26722680.Google ScholarGoogle Scholar
  14. [14] Hou Qibin, Cheng Ming-Ming, Hu Xiaowei, Borji Ali, Tu Zhuowen, and Torr Philip H. S.. 2017. Deeply supervised salient object detection with short connections. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 32033212.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Hou Qiqi and Liu Feng. 2019. Context-aware image matting for simultaneous foreground and alpha estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV’19). 41294138.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Isola Phillip, Zhu Jun-Yan, Zhou Tinghui, and Efros Alexei A.. 2017. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’17). 59675976.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Karacan L., Erdem A., and Erdem E.. 2015. Image matting with KL-divergence based sparse sampling. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV’15). 424432.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] Lee P. and Wu Ying. 2011. Nonlocal matting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’11). 21932200.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] Levin Anat, Lischinski Dani, and Weiss Yair. 2007. A closed-form solution to natural image matting. IEEE Transactions on Pattern Analysis and Machine Intelligence 30, 2 (2007), 228242.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Levin Anat, Rav-Acha Alex, and Lischinski Dani. 2008. Spectral matting. IEEE Transactions on Pattern Analysis and Machine Intelligence 30, 10 (2008), 16991712.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. [21] Li Yaoyi and Lu Hongtao. 2020. Natural image matting via guided contextual attention. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI’20). 1145011457.Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Lin Shanchuan, Ryabtsev Andrey, Sengupta Soumyadip, Curless Brian L., Seitz Steven M., and Kemelmacher-Shlizerman Ira. 2021. Real-time high-resolution background matting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’21). 87628771.Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Lin Tsung-Yi, Maire Michael, Belongie Serge, Hays James, Perona Pietro, Ramanan Deva, Dollár Piotr, and Zitnick C. Lawrence. 2014. Microsoft COCO: Common objects in context. In Proceedings of the European Conference on Computer Vision (ECCV’14). 740755.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Liu Wei, Rabinovich Andrew, and Berg Alexander C.. 2015. ParseNet: Looking wider to see better. arXiv preprint arXiv:1506.04579 (2015).Google ScholarGoogle Scholar
  25. [25] Lu Hao, Dai Yutong, Shen Chunhua, and Xu Songcen. 2019. Indices matter: Learning to index for deep image matting. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV’19). 32653274.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Lutz Sebastian, Amplianitis Konstantinos, and Smolic Aljoscha. 2018. AlphaGAN: Generative adversarial networks for natural image matting. In Proceedings of the British Machine Vision Conference (BMVC’18). 259.Google ScholarGoogle Scholar
  27. [27] Mei Haiyang, Liu Yuanyuan, Wei Ziqi, Zhou Dongsheng, Xiaopeng Xiaopeng, Zhang Qiang, and Yang Xin. 2021. Exploring dense context for salient object detection. IEEE Transactions on Circuits and Systems for Video Technology 32, 3 (2021), 1378–1389.Google ScholarGoogle Scholar
  28. [28] Mei Haiyang, Yang Xin, Wang Yang, Liu Yuanyuan, He Shengfeng, Zhang Qiang, Wei Xiaopeng, and Lau Rynson W. H.. 2020. Don’t hit me! Glass detection in real-world scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’20).Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Qiao Yu, Liu Yuhao, Yang Xin, Zhou Dongsheng, Xu Mingliang, Zhang Qiang, and Wei Xiaopeng. 2020. Attention-guided hierarchical structure aggregation for image matting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’20).Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Qiao Yu, Liu Yuhao, Zhu Qiang, Yang Xin, Wang Yuxin, Zhang Qiang, and Wei Xiaopeng. 2020. Multi-scale information assembly for image matting. Computer Graphics Forum 39 (2020), 565574.Google ScholarGoogle Scholar
  31. [31] Qin Xuebin, Zhang Zichen, Huang Chenyang, Gao Chao, Dehghan Masood, and Jagersand Martin. 2019. BASNet: Boundary-aware salient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19). 74717481.Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Rhemann C. and Rother C.. 2011. A global sampling method for alpha matting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’11). 20492056.Google ScholarGoogle Scholar
  33. [33] Rhemann Christoph, Rother Carsten, Wang Jue, Gelautz Margrit, Kohli Pushmeet, and Rott Pamela. 2009. A perceptually motivated online benchmark for image matting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’09). 18261833.Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] Sankaranarayanan Swami, Balaji Yogesh, Jain Arpit, Lim Ser Nam, and Chellappa Rama. 2018. Learning from synthetic data: Addressing domain shift for semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’18). 37523761.Google ScholarGoogle ScholarCross RefCross Ref
  35. [35] Sengupta Soumyadip, Jayaram Vivek, Curless Brian, Seitz Steven M., and Kemelmacher-Shlizerman Ira. 2020. Background matting: The world is your green screen. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’20). 22882297.Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Shahrian Ehsan, Rajan Deepu, Price Brian, and Cohen Scott. 2013. Improving image matting using comprehensive sampling sets. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’13). 636643.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. [37] Shen Xiaoyong, Tao Xin, Gao Hongyun, Zhou Chao, and Jia Jiaya. 2016. Deep automatic portrait matting. In Proceedings of the European Conference on Computer Vision (ECCV’16). 92107.Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Sun Yanan, Tang Chi-Keung, and Tai Yu-Wing. 2021. Semantic image matting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’21). 1112011129.Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Tang Jingwei, Aksoy Yagiz, Oztireli Cengiz, Gross Markus, and Aydin Tunc Ozan. 2019. Learning-based sampling for natural image matting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19). 30503058.Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Tian Xin, Xu Ke, Yang Xin, Du Lin, Yin Baocai, and Lau Rynson W. H.. 2022. Bi-directional object-context prioritization learning for saliency ranking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’22).Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Tian Xin, Xu Ke, Yang Xin, Yin Baocai, and Lau Rynson W. H.. 2020. Weakly-supervised salient instance detection. In Proceedings of the British Machine Vision Conference (BMVC’20).Google ScholarGoogle Scholar
  42. [42] Tian Xin, Xu Ke, Yang Xin, Yin Baocai, and Lau Rynson W. H.. 2021. Learning to detect instance-level salient objects using complementary image labels. International Journal of Computer Vision 130 (2021), 729–746.Google ScholarGoogle Scholar
  43. [43] Wan Renjie, Shi Boxin, Duan Ling-Yu, Tan Ah-Hwee, and Kot Alex C.. 2018. CRRN: Multi-scale guided concurrent reflection removal network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’18). 47774785.Google ScholarGoogle ScholarCross RefCross Ref
  44. [44] Wang Jue and Cohen Michael F.. 2007. Optimized color sampling for robust matting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’07). 18.Google ScholarGoogle ScholarCross RefCross Ref
  45. [45] Wang Yu, Niu Yi, Duan Peiyong, Lin Jianwei, and Zheng Yuanjie. 2018. Deep propagation based image matting. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’18). 9991006.Google ScholarGoogle ScholarCross RefCross Ref
  46. [46] Wang Zhou, Bovik Alan C., Sheikh Hamid R., and Simoncelli Eero P.. 2004. Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing 13, 4 (2004), 600612.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. [47] Wei Tianyi, Chen Dongdong, Zhou Wenbo, Liao Jing, Zhao Hanqing, Zhang Weiming, and Yu Nenghai. 2021. Improved image matting via real-time user clicks and uncertainty estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’21). 1537415383.Google ScholarGoogle ScholarCross RefCross Ref
  48. [48] Woo Sanghyun, Park Jongchan, Lee Joon-Young, and Kweon In So. 2018. CBAM: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV’18). 319.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. [49] Xiangli Yuanbo, Deng Yubin, Dai Bo, Loy Chen Change, and Lin Dahua. 2020. Real or not real, that is the question. In Proceedings of the International Conference on Learning Representations (ICLR’20).Google ScholarGoogle Scholar
  50. [50] Xie Saining, Girshick Ross, Dollar Piotr, Tu Zhuowen, and He Kaiming. 2017. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’17). 59875995.Google ScholarGoogle ScholarCross RefCross Ref
  51. [51] Xu Ning, Price Brian, Cohen Scott, and Huang Thomas. 2017. Deep image matting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’17). 311320.Google ScholarGoogle ScholarCross RefCross Ref
  52. [52] Yang Xin, Qiao Yu, Chen Shaozhe, He Shengfeng, Yin Baocai, Zhang Qiang, Wei Xiaopeng, and Lau Rynson W. H.. 2020. Smart scribbles for image matting. ACM Transactions on Multimedia Computing Communications and Applications 16, 4 (2020), Article 121, 21 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. [53] Yang Xin, Xu Ke, Chen Shaozhe, He Shengfeng, Yin Baocai Yin, and Lau Rynson. 2018. Active matting. In Proceedings of the International Conference on Neural Information Processing Systems (NeurIPS’18). 45904600.Google ScholarGoogle Scholar
  54. [54] Yu Guan, Chen Wei, Liang Xiao, Ding Zi’ang, and Peng Qunsheng. 2006. Easy matting—A stroke based approach for continuous image matting. Computer Graphics Forum 25, 3 (2006), 567576.Google ScholarGoogle Scholar
  55. [55] Yu Haichao, Xu Ning, Huang Zilong, Zhou Yuqian, and Shi Humphrey. 2021. High-resolution deep image matting. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI’21). 32173224.Google ScholarGoogle ScholarCross RefCross Ref
  56. [56] Yu Jiahui, Lin Zhe, Yang Jimei, Shen Xiaohui, Lu Xin, and Huang Thomas S.. 2018. Generative image inpainting with contextual attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’18). 55055514.Google ScholarGoogle ScholarCross RefCross Ref
  57. [57] Yu Qihang, Zhang Jianming, Zhang He, Wang Yilin, Lin Zhe, Xu Ning, Bai Yutong, and Yuille Alan. 2021. Mask guided matting via progressive refinement network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’21). 11541163.Google ScholarGoogle ScholarCross RefCross Ref
  58. [58] Zhang Yunke, Gong Lixue, Fan Lubin, Ren Peiran, Huang Qixing, Bao Hujun, and Xu Weiwei. 2019. A late fusion CNN for digital matting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19). 74617470.Google ScholarGoogle ScholarCross RefCross Ref
  59. [59] Zheng Yuanjie and Kambhamettu Chandra. 2009. Learning based digital matting. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV’09). 889896.Google ScholarGoogle Scholar
  60. [60] Zhu Jun-Yan, Park Taesung, Isola Phillip, and Efros Alexei A.. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV’17). 22422251.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Hierarchical and Progressive Image Matting

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 19, Issue 2
      March 2023
      540 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/3572860
      • Editor:
      • Abdulmotaleb El Saddik
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 6 February 2023
      • Online AM: 11 June 2022
      • Accepted: 23 May 2022
      • Revised: 11 May 2022
      • Received: 29 August 2021
      Published in tomm Volume 19, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!