Abstract
In this article, we present an approach for retrieving similar faces between the artistic and the real domain. The application we refer to is an interactive exhibition inside a museum, in which a visitor can take a photo of himself and search for a lookalike in the collection of paintings. The task requires not only to identify faces but also to extract discriminative features from artistic and photo-realistic images, tackling a significant domain shift. Our method integrates feature extraction networks which account for the aesthetic similarity of two faces and their correspondences in terms of semantic attributes. Also, it addresses the domain shift between realistic images and paintings by translating photo-realistic images into the artistic domain. Noticeably, by exploiting the same technique, our model does not need to rely on annotated data in the artistic domain. Experimental results are conducted on different paired datasets to show the effectiveness of the proposed solution in terms of identity and attribute preservation. The approach is also evaluated on unpaired settings and in combination with an interactive relevance feedback strategy. Finally, we show how the proposed algorithm has been implemented in a real showcase at the Gallerie Estensi museum in Italy, with the participation of more than 1,100 visitors in just three days.
- [1] . 2015. Matching caricatures to photographs. Signal, Image and Video Processing 9, 1 (2015), 295–303.Google Scholar
Cross Ref
- [2] . 2019. Explainable agents and robots: Results from a systematic literature review. In Proceedings of the International Conference on Autonomous Agents and MultiAgent Systems.Google Scholar
- [3] . 2018. ComboGAN: Unrestrained scalability for image domain translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops.Google Scholar
Cross Ref
- [4] . 2014. Painting-to-3D model alignment via discriminative visual elements. ACM Transactions on Graphics 33, 2 (2014), 14.Google Scholar
Digital Library
- [5] . 2018. Aligning text and document illustrations: Towards visually explainable digital humanities. In Proceedings of the International Conference on Pattern Recognition.Google Scholar
Cross Ref
- [6] . 2012. Memetically optimized MCWLD for matching sketches with digital face images. IEEE Transactions on Information Forensics and Security 7, 5 (2012), 1522–1535.Google Scholar
Digital Library
- [7] . 2020. Visual question answering for cultural heritage. IOP Conference Series: Materials Science and Engineering 949, 1 (2020), 012074.Google Scholar
- [8] . 2014. Miniature illustrations retrieval and innovative interaction for digital illuminated manuscripts. Multimedia Systems 20, 1 (2014), 65–79.Google Scholar
Digital Library
- [9] . 2011. Describing people: A poselet-based approach to attribute classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision.Google Scholar
Digital Library
- [10] . 2018. VGGFace2: A dataset for recognising faces across pose and age. In Proceedings of the IEEE International Conference on Automatic Face & Gesture Recognition.Google Scholar
Digital Library
- [11] . 2018. Visual-semantic alignment across domains using a semi-supervised approach. In Proceedings of the European Conference on Computer Vision Workshops.Google Scholar
- [12] . 2021. Learning to read l’infinito: Handwritten text recognition with synthetic training data. In Proceedings of the International Conference on Computer Analysis of Images and Patterns.Google Scholar
Digital Library
- [13] . 2021. Visual link retrieval and knowledge discovery in painting datasets. Multimedia Tools and Applications 80, 5 (2021), 6599–6616.Google Scholar
Digital Library
- [14] . 2021. Deep learning approaches to pattern extraction and recognition in paintings and drawings: An overview. Neural Computing and Applications 33, 6 (2021), 1–20.Google Scholar
- [15] . 2009. Mean shift feature space warping for relevance feedback. In Proceedings of the IEEE International Conference on Image Processing.Google Scholar
Digital Library
- [16] . 2017. Stylebank: An explicit representation for neural image style transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Google Scholar
Cross Ref
- [17] . 2018. StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Google Scholar
Cross Ref
- [18] . 2020. Explaining digital humanities by aligning images and textual descriptions. Pattern Recognition Letters 129 (2020), 166–172. https://www.sciencedirect.com/science/article/pii/S0167865519303381.Google Scholar
Cross Ref
- [19] . 2015. Face painting: Querying art with photos. In Proceedings of the British Machine Vision Conference.Google Scholar
Cross Ref
- [20] . 2014. The state of the art: Object retrieval in paintings using discriminative regions. In Proceedings of the British Machine Vision Conference.Google Scholar
Cross Ref
- [21] . 2016. The art of detection. In Proceedings of the European Conference on Computer Vision.Google Scholar
Cross Ref
- [22] . 2019. Webly-supervised zero-shot learning for artwork instance recognition. Pattern Recognition Letters 128, 2 (2019), 420–426.Google Scholar
Digital Library
- [23] . 2019. ArcFace: Additive angular margin loss for deep face recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Google Scholar
Cross Ref
- [24] . 2015. Robust face recognition via multimodal deep face representation. IEEE Transactions on Multimedia 17, 11 (2015), 2049–2058.Google Scholar
Digital Library
- [25] . 2018. VSE++: Improving visual-semantic embeddings with hard negatives. In Proceedings of the British Machine Vision Conference.Google Scholar
- [26] . 2018. How to read paintings: Semantic art understanding with multi-modal retrieval. In Proceedings of the European Conference on Computer Vision Workshops.Google Scholar
- [27] . 2016. Image style transfer using convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Google Scholar
Cross Ref
- [28] . 2017. Controlling perceptual factors in neural style transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Google Scholar
Cross Ref
- [29] . 2017. Exploring the structure of a real-time, arbitrary neural artistic stylization network. In Proceedings of the British Machine Vision Conference.Google Scholar
Cross Ref
- [30] . 2014. Detecting people in cubist art. In Proceedings of the European Conference on Computer Vision Workshops.Google Scholar
- [31] . 2017. AFFACT: Alignment-free facial attribute classification technique. In Proceeding of the International Joint Conference on Biometrics.Google Scholar
Digital Library
- [32] . 2016. MS-celeb-1M: A dataset and benchmark for large-scale face recognition. In Proceedings of the European Conference on Computer Vision.Google Scholar
Cross Ref
- [33] . 2017. Attributes for improved attributes: A multi-task network utilizing implicit and explicit relationships for facial attribute classification. In Proceedings of the AAAI Conference on Artificial Intelligence.Google Scholar
Cross Ref
- [34] . 2016. Deep residual learning for image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Google Scholar
Cross Ref
- [35] . 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Google Scholar
Cross Ref
- [36] . 2018. Multimodal unsupervised image-to-image translation. In Proceedings of the European Conference on Computer Vision.Google Scholar
Cross Ref
- [37] . 2017. Variation robust cross-modal metric learning for caricature recognition. In Proceedings of the ACM International Conference on Multimedia Workshops.Google Scholar
Digital Library
- [38] . 2018. WebCaricature: A benchmark for caricature face recognition. In Proceedings of the British Machine Vision Conference.Google Scholar
- [39] . 2017. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Google Scholar
Cross Ref
- [40] . 2018. Stroke controllable fast style transfer with adaptive receptive fields. In Proceedings of the European Conference on Computer Vision.Google Scholar
Cross Ref
- [41] . 2016. Perceptual losses for real-time style transfer and super-resolution. In Proceedings of the European Conference on Computer Vision.Google Scholar
Cross Ref
- [42] . 2014. Recognizing image style. In Proceedings of the British Machine Vision Conference.Google Scholar
Cross Ref
- [43] . 2015. Adam: A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations.Google Scholar
- [44] . 2012. Towards automated caricature recognition. In Proceedings of the International Conference on Biometrics.Google Scholar
Cross Ref
- [45] . 2009. Attribute and simile classifiers for face verification. In Proceedings of the IEEE/CVF International Conference on Computer Vision.Google Scholar
Cross Ref
- [46] . 2017. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Google Scholar
Cross Ref
- [47] . 2018. Diverse image-to-image translation via disentangled representations. In Proceedings of the European Conference on Computer Vision.Google Scholar
Cross Ref
- [48] . 2016. Precomputed real-time texture synthesis with markovian generative adversarial networks. In Proceedings of the European Conference on Computer Vision.Google Scholar
Cross Ref
- [49] . 2017. Diversified texture synthesis with feed-forward networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Google Scholar
Cross Ref
- [50] . 2017. Universal style transfer via feature transforms. In Proceedings of the Advances in Neural Information Processing Systems.Google Scholar
Cross Ref
- [51] . 2017. Unsupervised image-to-image translation networks. In Proceedings of the Advances in Neural Information Processing Systems.Google Scholar
Cross Ref
- [52] . 2017. Sphereface: Deep hypersphere embedding for face recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Google Scholar
Cross Ref
- [53] . 2016. Large-margin softmax loss for convolutional neural networks. In Proceedings of the International Conference on Machine Learning.Google Scholar
- [54] . 2015. Deep learning face attributes in the wild. In Proceedings of the IEEE/CVF International Conference on Computer Vision.Google Scholar
Digital Library
- [55] . 2017. Fully-adaptive feature sharing in multi-task networks with applications in person attribute classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Google Scholar
Cross Ref
- [56] . 2018. DA-GAN: Instance-level image translation by deep attention generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Google Scholar
Cross Ref
- [57] . 2011. Ensemble of exemplar-SVMs for object detection and beyond. In Proceedings of the IEEE/CVF International Conference on Computer Vision.Google Scholar
Digital Library
- [58] . 2017. Deepart: Learning joint representations of visual arts. In Proceedings of the ACM International Conference on Multimedia.Google Scholar
Digital Library
- [59] . 2018. Deep face recognition: A survey. In Proceedings of the Conference on Graphics, Patterns and Images.Google Scholar
Cross Ref
- [60] . 2016. Deep multi-scale video prediction beyond mean square error. In Proceedings of the International Conference on Learning Representations.Google Scholar
- [61] . 2016. IIIT-CFW: A benchmark database of cartoon faces in the wild. In Proceedings of the European Conference on Computer Vision Workshops.Google Scholar
Cross Ref
- [62] . 2015. Deep face recognition. In Proceedings of the British Machine Vision Conference.Google Scholar
Cross Ref
- [63] . 2016. Context encoders: Feature learning by inpainting. In Proceedings of the European Conference on Computer Vision.Google Scholar
Cross Ref
- [64] . 2015. Challenges in content-based image indexing of cultural heritage collections. IEEE Signal Processing Magazine 32, 4 (2015), 95–102.Google Scholar
Cross Ref
- [65] . 2016. Generative adversarial text to image synthesis. In Proceedings of the International Conference on Machine Learning.Google Scholar
Digital Library
- [66] . 2016. Learning what and where to draw. In Proceedings of the Advances in Neural Information Processing Systems.Google Scholar
- [67] . 2016. MOON: A mixed objective optimization network for the recognition of facial attributes. In Proceedings of the European Conference on Computer Vision.Google Scholar
Cross Ref
- [68] . 2018. A style-aware content loss for real-time hd style transfer. In Proceedings of the European Conference on Computer Vision.Google Scholar
Cross Ref
- [69] . 2016. Triplet probabilistic embedding for face verification and clustering. In Proceedings of the International Conference on Biometrics Theory, Applications and Systems.Google Scholar
Digital Library
- [70] . 2015. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Google Scholar
Cross Ref
- [71] . 2017. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE/CVF International Conference on Computer Vision.Google Scholar
Cross Ref
- [72] . 2018. Neural style transfer via meta networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Google Scholar
Cross Ref
- [73] . 2019. Discovering visual patterns in art collections with spatially-consistent feature learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Google Scholar
Cross Ref
- [74] . 2014. Deep inside convolutional networks: Visualising image classification models and saliency maps. In Proceedings of the International Conference on Learning Representations Workshops.Google Scholar
- [75] . 2015. Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations.Google Scholar
- [76] . 2015. Striving for simplicity: The all convolutional net. In Proceedings of the International Conference on Learning Representations Workshops.Google Scholar
- [77] . 2019. Artpedia: A new visual-semantic dataset with visual and contextual sentences in the artistic domain. In Proceedings of the International Conference on Image Analysis and Processing.Google Scholar
Digital Library
- [78] . 2017. OmniArt: Multi-task deep learning for artistic data analysis. ACM Transactions on Multimedia Computing, Communications, and Applications 14, 4 (2017), 88:1–88:21.Google Scholar
- [79] . 2014. Deep learning face representation by joint identification-verification. In Proceedings of the Advances in Neural Information Processing Systems.Google Scholar
- [80] . 2015. DeepID3: Face recognition with very deep neural networks. arXiv:1502.00873. Retrieved from https://arxiv.org/abs/1502.00873.Google Scholar
- [81] . 2014. Deep learning face representation from predicting 10,000 classes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Google Scholar
Digital Library
- [82] . 2017. Axiomatic attribution for deep networks. In Proceedings of the International Conference on Machine Learning.Google Scholar
- [83] . 2017. Unsupervised cross-domain image generation. In Proceedings of the International Conference on Learning Representations.Google Scholar
- [84] . 2014. DeepFace: Closing the gap to human-level performance in face verification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Google Scholar
Digital Library
- [85] . 2018. What was monet seeing while painting? Translating artworks to photo-realistic images. In Proceedings of the European Conference on Computer Vision Workshops.Google Scholar
- [86] . 2019. Art2Real: Unfolding the reality of artworks via semantically-aware image-to-image translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Google Scholar
Cross Ref
- [87] . 2019. Image-to-image translation to unfold the reality of artworks: An empirical analysis. In Proceedings of the International Conference on Image Analysis and Processing.Google Scholar
Digital Library
- [88] . 2016. Texture networks: Feed-forward synthesis of textures and stylized images. In Proceedings of the International Conference on Machine Learning.Google Scholar
- [89] . 2017. Improved texture networks: Maximizing quality and diversity in feed-forward stylization and texture synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Google Scholar
Cross Ref
- [90] . 2017. The pose knows: Video forecasting by generating pose futures. In Proceedings of the IEEE/CVF International Conference on Computer Vision.Google Scholar
Cross Ref
- [91] . 2018. The devil of face recognition is in the noise. In Proceedings of the European Conference on Computer Vision.Google Scholar
Cross Ref
- [92] . 2018. Additive margin softmax for face verification. IEEE Signal Processing Letters 25, 7 (2018), 926–930.Google Scholar
Cross Ref
- [93] . 2018. Deep face recognition: A survey. Neurocomputing 429 (2021), 215–244.Google Scholar
- [94] . 2016. Detecting people in artwork with CNNs. In Proceedings of the European Conference on Computer Vision Workshops.Google Scholar
Cross Ref
- [95] . 2017. Bam! the behance artistic media dataset for recognition beyond photography. In Proceedings of the IEEE/CVF International Conference on Computer Vision.Google Scholar
Cross Ref
- [96] . 2018. A light CNN for deep face representation with noisy labels. IEEE Transactions on Information Forensics and Security 13, 11 (2018), 2884–2896.Google Scholar
Cross Ref
- [97] . 2018. Crossing-domain generative adversarial networks for unsupervised multi-domain image-to-image translation. In Proceedings of the ACM International Conference on Multimedia.Google Scholar
Digital Library
- [98] . 2017. Semantic image inpainting with deep generative models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Google Scholar
Cross Ref
- [99] . 2017. S3fd: Single shot scale-invariant face detector. In Proceedings of the IEEE/CVF International Conference on Computer Vision.Google Scholar
- [100] . 2020. A survey of deep facial attribute analysis. International Journal of Computer Vision 128, 8 (2020), 1–33.Google Scholar
- [101] . 2016. Face attribute prediction using off-the-shelf CNN features. In Proceedings of the International Conference on Biometrics.Google Scholar
Cross Ref
- [102] . 1997. Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization. ACM Transactions on Mathematical Software 23, 4 (1997), 550–560.Google Scholar
Digital Library
- [103] . 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision.Google Scholar
Cross Ref
- [104] . 2017. Toward multimodal image-to-image translation. In Proceedings of the Advances in Neural Information Processing Systems.Google Scholar
Index Terms
Matching Faces and Attributes Between the Artistic and the Real Domain: the PersonArt Approach
Recommendations
Face matching and retrieval using soft biometrics
Soft biometric traits embedded in a face (e.g., gender and facial marks) are ancillary information and are not fully distinctive by themselves in face-recognition tasks. However, this information can be explicitly combined with face matching score to ...
An efficient method for face retrieval from large video datasets
CIVR '10: Proceedings of the ACM International Conference on Image and Video RetrievalThe human face is one of the most important objects in videos since it provides rich information for spotting certain people of interest, such as government leaders in news video, or the hero in a movie, and is the basis for interpreting facts. ...
Face image retrieval using sparse representation classifier with gabor-LBP histogram
WISA'10: Proceedings of the 11th international conference on Information security applicationsFace image retrieval is an important issue in the practical applications such as mug shot searching and surveillance systems. However, it is still a challenging problem because face images are fairly similar due to the same geometrical configuration of ...






Comments