skip to main content
10.1145/3394171.3413828acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Occluded Prohibited Items Detection: An X-ray Security Inspection Benchmark and De-occlusion Attention Module

Published:12 October 2020Publication History

ABSTRACT

Security inspection often deals with a piece of baggage or suitcase where objects are heavily overlapped with each other, resulting in an unsatisfactory performance for prohibited items detection in X-ray images. In the literature, there have been rare studies and datasets touching this important topic. In this work, we contribute the first high-quality object detection dataset for security inspection, named Occluded Prohibited Items X-ray (OPIXray) image benchmark. OPIXray focused on the widely-occurred prohibited item "cutter", annotated manually by professional inspectors from the international airport. The test set is further divided into three occlusion levels to better understand the performance of detectors. Furthermore, to deal with the occlusion in X-ray images detection, we propose the De-occlusion Attention Module (DOAM), a plug-and-play module that can be easily inserted into and thus promote most popular detectors. Despite the heavy occlusion in X-ray imaging, shape appearance of objects can be preserved well, and meanwhile different materials visually appear with different colors and textures. Motivated by these observations, our DOAM simultaneously leverages the different appearance information of the prohibited item to generate the attention map, which helps refine feature maps for the general detectors. We comprehensively evaluate our module on the OPIXray dataset, and demonstrate that our module can consistently improve the performance of the state-of-the-art detection methods such as SSD, FCOS, etc, and significantly outperforms several widely-used attention mechanisms. In particular, the advantages of DOAM are more significant in the scenarios with higher levels of occlusion, which demonstrates its potential application in real-world inspections. The OPIXray benchmark and our model are released at https://github.com/OPIXray-author/OPIXray.

Skip Supplemental Material Section

Supplemental Material

3394171.3413828.mp4

Security inspection often deals with a piece of baggage or suitcase where objects are heavily overlapped with each other, resulting in an unsatisfactory performance for prohibited items detection in X-ray images. In the literature, there have been rare studies and datasets touching this important topic. In this work, we contribute the first high-quality object detection dataset for security inspection, named Occluded Prohibited Items X-ray (OPIXray) image benchmark. OPIXray focused on the widely-occurred prohibited item "cutter", annotated manually by professional inspectors from the international airport. The test set is further divided into three occlusion levels to better understand the performance of detectors. Furthermore, to deal with the occlusion in X-ray images detection, we propose the De-occlusion Attention Module (DOAM), a plug-and-play module that can be easily inserted into and thus promote most popular detectors.

References

  1. Arjun Chaudhary, Abhishek Hazra, and Prakash Chaudhary. 2019. Diagnosis of Chest Diseases in X-Ray images using Deep Convolutional Neural Network. In 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT). IEEE, 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  2. Long Chen, Hanwang Zhang, Jun Xiao, Liqiang Nie, Jian Shao, Wei Liu, and Tat-Seng Chua. 2017. Sca-cnn: Spatial and channel-wise attention in convolutional networks for image captioning. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5659--5667.Google ScholarGoogle ScholarCross RefCross Ref
  3. Zhi-Qi Cheng, Jun-Xiu Li, Qi Dai, Xiao Wu, Jun-Yan He, and Alexander G Hauptmann. 2019. Improving the learning of multi-column convolutional neural network for crowd counting. In Proceedings of the 27th ACM International Conference on Multimedia. 1897--1906.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Jun Fu, Jing Liu, Haijie Tian, Yong Li, Yongjun Bao, Zhiwei Fang, and Hanqing Lu. 2019. Dual attention network for scene segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3146--3154.Google ScholarGoogle ScholarCross RefCross Ref
  5. Shiming Ge, Jia Li, Qiting Ye, and Zhao Luo. 2017. Detecting masked faces in the wild with lle-cnns. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2682--2690.Google ScholarGoogle ScholarCross RefCross Ref
  6. Shuai Guo, Songyuan Tang, Jianjun Zhu, Jingfan Fan, Danni Ai, Hong Song, Ping Liang, and Jian Yang. 2019. Improved U-Net for Guidewire Tip Segmentation in X-ray Fluoroscopy Images. In Proceedings of the 2019 3rd International Conference on Advances in Image Processing. 55--59.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7132--7141.Google ScholarGoogle ScholarCross RefCross Ref
  8. Shengling Huang, Xin Wang, Yifan Chen, Jie Xu, Tian Tang, and Baozhong Mu. 2019. Modeling and quantitative analysis of X-ray transmission and backscatter imaging aimed at security inspection. Optics express, Vol. 27, 2 (2019), 337--349.Google ScholarGoogle Scholar
  9. Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. nature, Vol. 521, 7553 (2015), 436--444.Google ScholarGoogle Scholar
  10. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. 2016. Ssd: Single shot multibox detector. In European conference on computer vision. Springer, 21--37.Google ScholarGoogle ScholarCross RefCross Ref
  11. Jianjie Lu and Kai-yu Tong. 2019. Towards to Reasonable Decision Basis in Automatic Bone X-Ray Image Classification: A Weakly-Supervised Approach. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 9985--9986.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Domingo Mery, Vladimir Riffo, Uwe Zscherpel, German Mondragón, Iván Lillo, Irene Zuccar, Hans Lobel, and Miguel Carrasco. 2015. GDXray: The database of X-ray images for nondestructive testing. Journal of Nondestructive Evaluation, Vol. 34, 4 (2015), 42.Google ScholarGoogle ScholarCross RefCross Ref
  13. Caijing Miao, Lingxi Xie, Fang Wan, Chi Su, Hongye Liu, Jianbin Jiao, and Qixiang Ye. 2019. Sixray: A large-scale security inspection x-ray benchmark for prohibited item discovery in overlapping images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2119--2128.Google ScholarGoogle ScholarCross RefCross Ref
  14. Liang Peng, Yang Yang, Zheng Wang, Xiao Wu, and Zi Huang. 2019. CRA-Net: Composed Relation Attention Network for Visual Question Answering. In Proceedings of the 27th ACM International Conference on Multimedia. 1202--1210.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Joseph Redmon and Ali Farhadi. 2018. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018).Google ScholarGoogle Scholar
  16. Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision. 618--626.Google ScholarGoogle ScholarCross RefCross Ref
  17. Lingxue Song, Dihong Gong, Zhifeng Li, Changsong Liu, and Wei Liu. 2019. Occlusion Robust Face Recognition Based on Mask Learning With Pairwise Differential Siamese Network. In Proceedings of the IEEE International Conference on Computer Vision. 773--782.Google ScholarGoogle ScholarCross RefCross Ref
  18. Xie Sun, Lu Jin, and Zechao Li. 2019. Attention-Aware Feature Pyramid Ordinal Hashing for Image Retrieval. In Proceedings of the ACM Multimedia Asia on ZZZ. 1--6.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Jinhui Tang, Lu Jin, Zechao Li, and Shenghua Gao. 2015. RGB-D object recognition via incorporating latent data structure and prior knowledge. IEEE Transactions on Multimedia, Vol. 17, 11 (2015), 1899--1908.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Zhi Tian, Chunhua Shen, Hao Chen, and Tong He. 2019. Fcos: Fully convolutional one-stage object detection. In Proceedings of the IEEE International Conference on Computer Vision. 9627--9636.Google ScholarGoogle ScholarCross RefCross Ref
  21. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008.Google ScholarGoogle Scholar
  22. Pichao Wang, Zhaoyang Li, Yonghong Hou, and Wanqing Li. 2016. Action recognition based on joint trajectory maps using convolutional neural networks. In Proceedings of the 24th ACM international conference on Multimedia. 102--106.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. 2018a. Non-local neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7794--7803.Google ScholarGoogle ScholarCross RefCross Ref
  24. Xinlong Wang, Tete Xiao, Yuning Jiang, Shuai Shao, Jian Sun, and Chunhua Shen. 2018b. Repulsion loss: Detecting pedestrians in a crowd. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7774--7783.Google ScholarGoogle ScholarCross RefCross Ref
  25. Jian Yang, Lei Luo, Jianjun Qian, Ying Tai, Fanlong Zhang, and Yong Xu. 2016. Nuclear norm based matrix regression with applications to face recognition with occlusion and illumination changes. IEEE transactions on pattern analysis and machine intelligence, Vol. 39, 1 (2016), 156--171.Google ScholarGoogle Scholar
  26. Xingxu Yao, Dongyu She, Sicheng Zhao, Jie Liang, Yu-Kun Lai, and Jufeng Yang. 2019. Attention-aware polarity sensitive embedding for affective image retrieval. In Proceedings of the IEEE International Conference on Computer Vision. 1140--1150.Google ScholarGoogle ScholarCross RefCross Ref
  27. Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S Huang. 2019. Free-form image inpainting with gated convolution. In Proceedings of the IEEE International Conference on Computer Vision. 4471--4480.Google ScholarGoogle ScholarCross RefCross Ref
  28. Zheng-Jun Zha, Jiawei Liu, Tianhao Yang, and Yongdong Zhang. 2019. Spatiotemporal-Textual Co-Attention Network for Video Question Answering. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Vol. 15, 2s (2019), 1--18.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Shifeng Zhang, Longyin Wen, Xiao Bian, Zhen Lei, and Stan Z Li. 2018. Occlusion-aware R-CNN: detecting pedestrians in a crowd. In Proceedings of the European Conference on Computer Vision (ECCV). 637--653.Google ScholarGoogle ScholarCross RefCross Ref
  30. Chunluan Zhou and Junsong Yuan. 2018. Bi-box regression for pedestrian detection and occlusion estimation. In Proceedings of the European Conference on Computer Vision (ECCV). 135--151.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Occluded Prohibited Items Detection: An X-ray Security Inspection Benchmark and De-occlusion Attention Module

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader