skip to main content
research-article

xCos: An Explainable Cosine Metric for Face Verification Task

Published:15 November 2021Publication History
Skip Abstract Section

Abstract

We study the XAI (explainable AI) on the face recognition task, particularly the face verification. Face verification has become a crucial task in recent days and it has been deployed to plenty of applications, such as access control, surveillance, and automatic personal log-on for mobile devices. With the increasing amount of data, deep convolutional neural networks can achieve very high accuracy for the face verification task. Beyond exceptional performances, deep face verification models need more interpretability so that we can trust the results they generate. In this article, we propose a novel similarity metric, called explainable cosine (xCos), that comes with a learnable module that can be plugged into most of the verification models to provide meaningful explanations. With the help of xCos, we can see which parts of the two input faces are similar, where the model pays its attention to, and how the local similarities are weighted to form the output xCos score. We demonstrate the effectiveness of our proposed method on LFW and various competitive benchmarks, not only resulting in providing novel and desirable model interpretability for face verification but also ensuring the accuracy as plugging into existing face recognition models.

REFERENCES

  1. [1] Bartlett Peter L., Pereira Fernando C. N., Burges Christopher J. C., Bottou Léon, and Weinberger Kilian Q. (Eds.). 2012. In Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings. http://papers.nips.cc/book/advances-in-neural-information-processing-systems-25-2012.Google ScholarGoogle Scholar
  2. [2] Brendel Wieland and Bethge Matthias. 2019. Approximating CNNs with Bag-of-Local-Features models works surprisingly well on ImageNet. In International Conference on Learning Representations. https://openreview.net/pdf?id=SkfMWhAqYQ.Google ScholarGoogle Scholar
  3. [3] Cao Qiong, Shen Li, Xie Weidi, Parkhi Omkar M., and Zisserman Andrew. 2018. Vggface2: A dataset for recognising faces across pose and age. In 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG’18). IEEE, 6774.Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Castañón Greg and Byrne Jeffrey. 2018. Visualizing and quantifying discriminative features for face recognition. In 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG’18), 1623.Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Chang Ya-Liang, Liu Zhe Yu, Lee Kuan-Ying, and Hsu Winston. 2019. Free-form video inpainting with 3D gated convolution and temporal PatchGAN. In Proceedings of the International Conference on Computer Vision (ICCV’19).Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Chattopadhay A., Sarkar A., Howlader P., and Balasubramanian V. N.. 2018. Grad-CAM++: Generalized gradient-based visual explanations for deep convolutional networks. In 2018 IEEE Winter Conference on Applications of Computer Vision (WACV’18). 839847. https://doi.org/10.1109/WACV.2018.00097Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Chen B., Chen C., and Hsu W. H.. 2015. Face recognition and retrieval using cross-age reference coding with cross-age celebrity dataset. IEEE Transactions on Multimedia 17, 6 (2015), 804815.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Chen Runjin, Chen Hao, Ren Jie, Huang Ge, and Zhang Quanshi. 2019. Explaining neural networks semantically and quantitatively. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV’19).Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Deng Jiankang, Guo Jia, Niannan Xue, and Zafeiriou Stefanos. 2019. ArcFace: Additive angular margin loss for deep face recognition. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19).Google ScholarGoogle Scholar
  10. [10] Deng Jiankang, Zhou Yuxiang, and Zafeiriou Stefanos P.. 2017. Marginal loss for deep face recognition. In IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’17), 20062014.Google ScholarGoogle Scholar
  11. [11] Gu D., Li Y., Jiang F., Wen Z., Liu S., Shi W., Lu G., and Zhou C.. 2020. VINet: A visually interpretable image diagnosis network. IEEE Transactions on Multimedia 22, 7 (2020), 17201729.Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Gunning David. 2017. Explainable artificial intelligence (XAI). Defense Advanced Research Projects Agency (DARPA), nd Web 2 (2017).Google ScholarGoogle Scholar
  13. [13] Guo Yandong, Zhang Lei, Hu Yuxiao, He Xiaodong, and Gao Jianfeng. 2016. MS-Celeb-1M: A dataset and benchmark for large-scale face recognition. In European Conference Computer Vision (ECCV’16).Google ScholarGoogle Scholar
  14. [14] He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2016. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 770778. https://doi.org/10.1109/CVPR.2016.90Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Hind Michael, Wei Dennis, Campbell Murray, Codella Noel C. F., Dhurandhar Amit, Mojsilović Aleksandra, Ramamurthy Karthikeyan Natesan, and Varshney Kush R.. 2019. TED: Teaching AI to explain its decisions. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society (AIES’19). ACM, New York, NY, 123129. https://doi.org/10.1145/3306618.3314273 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. [16] Hinton Geoffrey, Vinyals Oriol, and Dean Jeffrey. 2015. Distilling the knowledge in a neural network. In NIPS Deep Learning and Representation Learning Workshop. http://arxiv.org/abs/1503.02531.Google ScholarGoogle Scholar
  17. [17] Huang Gary B., Ramesh Manu, Berg Tamara, and Learned-Miller Erik. 2007. Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments. Technical Report 07-49. University of Massachusetts, Amherst.Google ScholarGoogle Scholar
  18. [18] Huang Rui, Zhang Shu, Li Tianyu, and He Ran. 2017. Beyond face rotation: Global and local perception GAN for photorealistic and identity preserving frontal view synthesis. In The IEEE International Conference on Computer Vision (ICCV’17).Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Kan Meina, Shan Shiguang, Chang Hong, and Chen Xilin. 2014. Stacked progressive auto-encoders (SPAE) for face recognition across poses. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). 18831890. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Kumar N., Berg A. C., Belhumeur P. N., and Nayar S. K.. 2009. Attribute and simile classifiers for face verification. In IEEE 12th International Conference on Computer Vision. 365372.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Liu Weiyang, Wen Yandong, Yu Zhiding, Li Ming, Raj Bhiksha, and Song Le. 2017. SphereFace: Deep hypersphere embedding for face recognition. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17).Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Liu Xuan, Wang Xiaoguang, and Matwin Stan. 2018. Improving the interpretability of deep neural networks with knowledge distillation. In IEEE International Conference on Data Mining Workshops (ICDMW’18), 905912.Google ScholarGoogle Scholar
  23. [23] Lu Chaochao and Tang Xiaoou. 2015. Surpassing human-level face verification performance on LFW with gaussian face. In Proceedings of the 29th AAAI Conference on Artificial Intelligence (AAAI’15). AAAI Press, 38113819. http://dl.acm.org/citation.cfm?id=2888116.2888245. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] Martinez A. M. and Benavente Robert. 1998. The AR face database. Tech. Rep. 24 CVC Technical Report (Jan. 1998).Google ScholarGoogle Scholar
  25. [25] Meng Lixuan, Yan Chenggang, Li Jun, Yin Jian, Liu Wu, Xie Hongtao, and Li Liang. 2020. Multi-features fusion and decomposition for age-invariant face recognition. In Proceedings of the 28th ACM International Conference on Multimedia (MM’20). Association for Computing Machinery, New York, NY, 31463154. https://doi.org/10.1145/3394171.3413499 Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. [26] Moschoglou S., Papaioannou A., Sagonas C., Deng J., Kotsia I., and Zafeiriou S.. 2017. AgeDB: The first manually collected, in-the-wild age database. In IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’17). 19972005. https://doi.org/10.1109/CVPRW.2017.250Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Parkhi Omkar M., Vedaldi Andrea, and Zisserman Andrew. 2015. Deep face recognition. In The British Machine Vision Conference (BMVC’15).Google ScholarGoogle Scholar
  28. [28] Ren Shaoqing, He Kaiming, Girshick Ross B., and Sun Jian. 2017. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 6 (2017), 11371149. https://doi.org/10.1109/TPAMI.2016.2577031 Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. [29] Ribeiro Marco Tulio, Singh Sameer, and Guestrin Carlos. 2016. “Why should I trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’16). ACM, New York, NY, 11351144. https://doi.org/10.1145/2939672.2939778 Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. [30] Selvaraju R. R., Cogswell M., Das A., Vedantam R., Parikh D., and Batra D.. 2017. Grad-CAM: Visual explanations from deep networks via gradient-based localization. In IEEE International Conference on Computer Vision (ICCV’17). 618626. https://doi.org/10.1109/ICCV.2017.74Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Sengupta S., Chen J., Castillo C., Patel V. M., Chellappa R., and Jacobs D. W.. 2016. Frontal to profile face verification in the wild. In IEEE Winter Conference on Applications of Computer Vision (WACV’16). 19. https://doi.org/10.1109/WACV.2016.7477558Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Shelhamer Evan, Long Jonathan, and Darrell Trevor. 2017. Fully convolutional networks for semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 4 (2017), 640651. https://doi.org/10.1109/TPAMI.2016.2572683 Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. [33] Sun Yi, Chen Yuheng, Wang Xiaogang, and Tang Xiaoou. 2014. Deep learning face representation by joint identification-verification. In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014. 19881996. http://papers.nips.cc/paper/5416-deep-learning-face-representation-by-joint-identification-verification. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. [34] Wang H. J., Wang Yitong, Zhou Zheng, Ji Xing, Gong Dihong, Zhou Jingchao, Li Zhifeng, and Liu Wei. 2018. CosFace: Large margin cosine loss for deep face recognition. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’18). 52655274.Google ScholarGoogle Scholar
  35. [35] Wen Yandong, Zhang Kaipeng, Li Zhifeng, and Qiao Yu. 2016. A discriminative feature learning approach for deep face recognition. In European Conference Computer Vision (ECCV’16).Google ScholarGoogle Scholar
  36. [36] Williford Jonathan R., May Brandon B., and Byrne Jeffrey. 2020. Explainable face recognition. In 16th European Conference Computer Vision (ECCV’20), Proceedings, Part XI(Lecture Notes in Computer Science, Vol. 12356), Vedaldi Andrea, Bischof Horst, Brox Thomas, and Frahm Jan-Michael (Eds.). Springer, 248263. https://doi.org/10.1007/978-3-030-58621-8_15Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. [37] Wolf Lior, Hassner Tal, and Maoz Itay. 2011. Face recognition in unconstrained videos with matched background similarity. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’11). 529534. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. [38] Yang Wenjie, Huang Houjing, Zhang Zhang, Chen Xiaotang, Huang Kaiqi, and Zhang Shu. 2019. Towards rich feature discovery with class activation maps augmentation for person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19).Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Yi Dong, Lei Zhen, Liao Shengcai, and Li Stan Z.. 2014. Learning face representation from scratch. ArXiv abs/1411.7923 (2014).Google ScholarGoogle Scholar
  40. [40] Yin Bangjie, Tran Luan, Li Haoxiang, Shen Xiaohui, and Liu Xiaoming. 2019. Towards interpretable face recognition. In Proceeding of International Conference on Computer Vision (ICCV’19).Google ScholarGoogle Scholar
  41. [41] Zhang K., Zhang Z., Li Z., and Qiao Y.. 2016. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters 23, 10 (Oct. 2016), 14991503. https://doi.org/10.1109/LSP.2016.2603342Google ScholarGoogle ScholarCross RefCross Ref
  42. [42] Zheng Tianyue, Deng Weihong, and Hu Jiani. 2017. Cross-age LFW: A database for studying cross-age face recognition in unconstrained environments. CoRR abs/1708.08197 (2017). arxiv:1708.08197 http://arxiv.org/abs/1708.08197.Google ScholarGoogle Scholar
  43. [43] Zhou Bolei, Khosla Aditya, Lapedriza Agata, Oliva Aude, and Torralba Antonio. 2015. Learning deep features for discriminative localization. arXiv e-prints, Article arXiv:1512.04150 (Dec 2015), arXiv:1512.04150 pages. arxiv:1512.04150 [cs.CV]Google ScholarGoogle Scholar
  44. [44] Zhu Z., Luo P., Wang X., and Tang X.. 2013. Deep learning identity-preserving face space. In IEEE International Conference on Computer Vision (ICCV’13). 113120.Google ScholarGoogle Scholar

Index Terms

  1. xCos: An Explainable Cosine Metric for Face Verification Task

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Multimedia Computing, Communications, and Applications
        ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 17, Issue 3s
        October 2021
        324 pages
        ISSN:1551-6857
        EISSN:1551-6865
        DOI:10.1145/3492435
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 15 November 2021
        • Accepted: 1 June 2021
        • Revised: 1 May 2021
        • Received: 1 December 2020
        Published in tomm Volume 17, Issue 3s

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      View Full Text

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!