skip to main content
research-article

Deeply Activated Salient Region for Instance Search

Published:01 November 2022Publication History
Skip Abstract Section

Abstract

The performance of instance search relies heavily on the ability to locate and describe a wide variety of object instances in a video/image collection. Due to the lack of a proper mechanism for locating instances and deriving feature representation, instance search is generally only effective when the instances are from known object categories. In this article, a simple but effective instance-level feature representation approach is presented. Different from the existing approaches, the issues of class-agnostic instance localization and distinctive feature representation are considered. The former is achieved by detecting salient instance regions from an image by a layer-wise back-propagation process. The back-propagation starts from the last convolution layer of a pre-trained CNNs that is originally used for classification. The back-propagation proceeds layer by layer until it reaches the input layer. This allows the salient instance regions in the input image from both known and unknown categories to be activated. Each activated salient region covers the full or, more usually, a major range of an instance. The distinctive feature representation is produced by average-pooling on the feature map of a certain layer with the detected instance region. Experiments show that this kind of feature representation demonstrates considerably better performance than most of the existing approaches.

REFERENCES

  1. [1] Awad George, Kraaij Wessel, Over Paul, and Satoh Shin’ichi. 2017. Instance search retrospective with focus on TRECVID. International Journal of Multimedia Information Retrieval 6, 1 (2017), 129.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Babenko Artem, Slesarev Anton, Chigorin Alexandr, and Lempitsky Victor. 2014. Neural codes for image retrieval. In European Conference on Computer Vision. Springer, 584599.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Bay Herbert, Tuytelaars Tinne, and Gool Luc Van. 2006. Surf: Speeded up robust features. In European Conference on Computer Vision. Springer, 404417.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] Douze Matthijs, Jegou Herve, Sandhawalia Harsimrat, Amsaleg Laurent, and Schmid Cordelia. 2009. Evaluation of GIST descriptors for web-scale image search. In Proceedings of the ACM International Conference on Image and Video Retrieval. 18.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Fan Heng, Lin Liting, Yang Fan, Chu Peng, Deng Ge, Yu Sijia, Bai Hexin, Xu Yong, Liao Chunyuan, and Ling Haibin. 2019. LaSOT: A high-quality benchmark for large-scale single object tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 53745383.Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Girshick Ross. 2015. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision. 14401448.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. [7] Girshick Ross, Donahue Jeff, Darrell Trevor, and Malik Jitendra. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 580587.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] He Kaiming, Gkioxari Georgia, Dollár Piotr, and Girshick Ross. 2017. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision. 29612969.Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770778.Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Hosang Jan, Benenson Rodrigo, Dollár Piotr, and Schiele Bernt. 2015. What makes for effective detection proposals? IEEE Transactions on Pattern Analysis and Machine Intelligence 38, 4 (2015), 814830.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Huang Lianghua, Zhao Xin, and Huang Kaiqi. 2019. GOT-10k: A large high-diversity benchmark for generic object tracking in the wild. IEEE Transactions on Pattern Analysis and Machine Intelligence 99 (2019), 1.Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Iscen Ahmet, Tolias Giorgos, Avrithis Yannis, Furon Teddy, and Chum Ondrej. 2017. Efficient diffusion on region manifolds: Recovering small objects with compact CNN representations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 20772086.Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Jegou Herve, Douze Matthijs, and Schmid Cordelia. 2008. Hamming embedding and weak geometric consistency for large scale image search. In European Conference on Computer Vision. Springer, 304317.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] Jegou Herve, Perronnin Florent, Douze Matthijs, Sánchez Jorge, Perez Patrick, and Schmid Cordelia. 2011. Aggregating local image descriptors into compact codes. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 9 (2011), 17041716.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. [15] Jimenez Albert, Alvarez Jose M., and Giró-i-Nieto Xavier. 2017. Class-weighted convolutional features for visual instance search. In Proceedings of the British Machine Vision Conference.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Kalantidis Yannis, Mellina Clayton, and Osindero Simon. 2016. Cross-dimensional weighting for aggregated deep convolutional features. In European Conference on Computer Vision. Springer, 685701.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Kim Jaeyoon and Yoon Sung-Eui. 2018. Regional attention based deep feature for image retrieval. In Proceedings of the British Machine Vision Conference. 209.Google ScholarGoogle Scholar
  18. [18] Krizhevsky Alex, Sutskever Ilya, and Hinton Geoffrey E.. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 10971105.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] Li Yi, Qi Haozhi, Dai Jifeng, Ji Xiangyang, and Wei Yichen. 2017. Fully convolutional instance-aware semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 23592367.Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Lin Jie, Zhan Yu, and Zhao Wan-Lei. 2019. Instance search based on weakly supervised feature learning. Neurocomputing 424 (2019), 117–124.Google ScholarGoogle Scholar
  21. [21] Lin Tsung-Yi, Maire Michael, Belongie Serge, Hays James, Perona Pietro, Ramanan Deva, Dollár Piotr, and Zitnick C. Lawrence. 2014. Microsoft COCO: Common objects in context. In European Conference on Computer Vision. Springer, 740755.Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Lowe David G.. 2004. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60, 2 (2004), 91110.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Maron Oded and Lozano-Pérez Tomás. 1998. A framework for multiple-instance learning. In Advances in Neural Information Processing Systems. 570576.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] Mohedano Eva, McGuinness Kevin, Giró-i-Nieto Xavier, and O’Connor Noel E.. 2018. Saliency weighted convolutional features for instance search. In Proceedings of the International Conference on Content-Based Multimedia Indexing. IEEE, Los Alamitos, CA, 16.Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Noh Hyeonwoo, Araujo Andre, Sim Jack, Weyand Tobias, and Han Bohyung. 2017. Large-scale image retrieval with attentive deep local features. In Proceedings of the IEEE International Conference on Computer Vision. 34563465.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Pan Junting, Canton-Ferrer Cristian, McGuinness Kevin, O’Connor Noel E., Torres Jordi, Sayrol Elisa, and Giró-i-Nieto Xavier. 2017. SalGAN: Visual saliency prediction with generative adversarial networks. arXiv:1701.01081 (2017).Google ScholarGoogle Scholar
  27. [27] Paulin Mattis, Douze Matthijs, Harchaoui Zaid, Mairal Julien, Perronin Florent, and Schmid Cordelia. 2015. Local convolutional features with unsupervised training for image retrieval. In Proceedings of the IEEE International Conference on Computer Vision. 9199.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] Razavian Ali S., Sullivan Josephine, Carlsson Stefan, and Maki Atsuto. 2016. Visual instance retrieval with deep convolutional networks. ITE Transactions on Media Technology and Applications 4, 3 (2016), 251258.Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Real Esteban, Shlens Jonathon, Mazzocchi Stefano, Pan Xin, and Vanhoucke Vincent. 2017. YouTube-BoundingBoxes: A large high-precision human-annotated data set for object detection in video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 52965305.Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Redmon Joseph and Farhadi Ali. 2018. YOLOv3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018).Google ScholarGoogle Scholar
  31. [31] Ren Shaoqing, He Kaiming, Girshick Ross, and Sun Jian. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems. 9199.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] Salvador Amaia, Giró-i-Nieto Xavier, Marqués Ferran, and Satoh Shin’ichi. 2016. Faster R-CNN features for instance search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 916.Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] Simonyan Karen and Zisserman Andrew. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google ScholarGoogle Scholar
  34. [34] Sivic Josef and Zisserman Andrew. 2003. Video Google: A text retrieval approach to object matching in videos. In Proceedings of the IEEE International Conference on Computer Vision. 14701477.Google ScholarGoogle ScholarCross RefCross Ref
  35. [35] Tang Peng, Wang Xinggang, Bai Song, Shen Wei, Bai Xiang, Liu Wenyu, and Yuille Alan. 2018. PCL: Proposal cluster learning for weakly supervised object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 1 (2018), 176191.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. [36] Tang Peng, Wang Xinggang, Bai Xiang, and Liu Wenyu. 2017. Multiple instance detection network with online instance classifier refinement. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 28432851.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Tolias Giorgos, Sicre Ronan, and Jégou Hervé. 2015. Particular object retrieval with integral max-pooling of CNN activations. arXiv preprint arXiv:1511.05879 (2015).Google ScholarGoogle Scholar
  38. [38] Wan Fang, Liu Chang, Ke Wei, Ji Xiangyang, Jiao Jianbin, and Ye Qixiang. 2019. C-MIL: Continuation multiple instance learning for weakly supervised object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 21992208.Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Wang Shuang and Jiang Shuqiang. 2015. INSTRE: A new benchmark for instance-level object retrieval and recognition. ACM Transactions on Multimedia Computing, Communications, and Applications 11, 3 (2015), 37.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. [40] Wei Xiu-Shen, Luo Jian-Hao, Wu Jianxin, and Zhou Zhi-Hua. 2017. Selective convolutional descriptor aggregation for fine-grained image retrieval. IEEE Transactions on Image Processing 26, 6 (2017), 28682881.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. [41] Yi-Geng Hong, Hui-Chu Xiao, and Wan-Lei Zhao. 2021. Towards accurate localization by instance search. In Proceedings of ACM International Conference on Multimedia. 38073815.Google ScholarGoogle Scholar
  42. [42] Zeiler Matthew D. and Fergus Rob. 2014. Visualizing and understanding convolutional networks. In European Conference on Computer Vision. Springer, 818833.Google ScholarGoogle ScholarCross RefCross Ref
  43. [43] Zhan Yu and Zhao Wan-Lei. 2018. Instance search via instance level segmentation and feature representation. arXiv preprint arXiv:1806.03576 (2018).Google ScholarGoogle Scholar
  44. [44] Zhang Jianming, Bargal Sarah Adel, Lin Zhe, Brandt Jonathan, Shen Xiaohui, and Sclaroff Stan. 2018. Top-down neural attention by excitation backprop. International Journal of Computer Vision 126, 10 (2018), 10841102.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. [45] Zhou Bolei, Khosla Aditya, Lapedriza Agata, Oliva Aude, and Torralba Antonio. 2016. Learning deep features for discriminative localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 29212929.Google ScholarGoogle ScholarCross RefCross Ref
  46. [46] Zhou Yanzhao, Zhu Yi, Ye Qixiang, Qiu Qiang, and Jiao Jianbin. 2018. Weakly supervised instance segmentation using class peak response. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 37913800.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Deeply Activated Salient Region for Instance Search

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 18, Issue 3s
      October 2022
      381 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/3567476
      • Editor:
      • Abdulmotaleb El Saddik
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 November 2022
      • Online AM: 18 February 2022
      • Accepted: 22 December 2021
      • Revised: 16 November 2021
      • Received: 6 July 2020
      Published in tomm Volume 18, Issue 3s

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Refereed
    • Article Metrics

      • Downloads (Last 12 months)60
      • Downloads (Last 6 weeks)3

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!