Abstract
This article strives for a diversely supervised visual product search, where queries specify a diverse set of labels to search for. Where previous works have focused on representing attribute, instance, or category labels individually, we consider them together to create a diverse set of labels for visually describing products. We learn an embedding from the supervisory signal provided by every label to encode their interrelationships. Once trained, every label has a corresponding visual representation in the embedding space, which is an aggregation of selected items from the training set. At search time, composite query representations retrieve images that match a specific set of diverse labels. We form composite query representations by averaging over the aggregated representations of each diverse label in the specific set. For evaluation, we extend existing product datasets of cars and clothes with a diverse set of labels. Experiments show the benefits of our embedding for diversely supervised visual product search in seen and unseen product combinations and for discovering product design styles.
- [1] . 2018. Learning attribute representations with localization for flexible fashion search. In CVPR.Google Scholar
- [2] . 2019. Attribute manipulation generative adversarial networks for fashion images. In ICCV.Google Scholar
- [3] . 2016. Label-embedding for image classification. IEEE Trans. Pattern Anal. Mach. Intell. 38, 7 (2016).Google Scholar
Cross Ref
- [4] . 2017. Fashion forward: Forecasting visual style in fashion. In ICCV.Google Scholar
- [5] . 2015. Learning visual similarity for product design with convolutional neural networks. ACM Trans. Graph. 34, 4 (2015). Google Scholar
Digital Library
- [6] . 2011. PiCoDes: Learning a compact code for novel-category recognition. In NeurIPS. Google Scholar
Digital Library
- [7] . 2015. A large annotated corpus for learning natural language inference. In EMNLP.Google Scholar
- [8] . 2020. Image retrieval for complex queries using knowledge embedding. ACM Trans. Multim. Comput., Commun. Applic. 16, 1 (2020). Google Scholar
Digital Library
- [9] . 2009. An online algorithm for large scale image similarity learning. In NeurIPS. Google Scholar
Digital Library
- [10] . 2020. Improved baselines with momentum contrastive learning. In arXiv:2003.04297.Google Scholar
- [11] . 2005. Learning a similarity metric discriminatively, with application to face verification. In CVPR. Google Scholar
Digital Library
- [12] . 2011. Visual and semantic similarity in ImageNet. In CVPR. Google Scholar
Digital Library
- [13] . 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL.Google Scholar
- [14] . 2009. Describing objects by their attributes. In CVPR.Google Scholar
- [15] . 2008. Learning visual attributes. In NeurIPS. Google Scholar
Digital Library
- [16] . 2007. Image retrieval and classification using local distance functions. In NeurIPS. Google Scholar
Digital Library
- [17] . 2017. Beyond instance-level image retrieval: Leveraging captions to learn a global visual representation for semantic retrieval. In CVPR.Google Scholar
- [18] . 2006. Dimensionality reduction by learning an invariant mapping. In CVPR. Google Scholar
Digital Library
- [19] . 2017. Automatic spatially aware fashion concept discovery. In ICCV.Google Scholar
- [20] . 2020. Momentum contrast for unsupervised visual representation learning. In CVPR.Google Scholar
- [21] . 2016. Deep residual learning for image recognition. In CVPR.Google Scholar
- [22] . 2018. Creating capsule wardrobes from fashion images. In CVPR.Google Scholar
- [23] . 2015. Cross-domain image retrieval with a dual attribute-aware ranking network. In CVPR. Google Scholar
Digital Library
- [24] . 2014. Circle & search: Attribute-aware shoe retrieval. ACM Trans. Multim. Comput., Commun. Applic. 11, 1 (2014).Google Scholar
- [25] . 2015. Visual search at Pinterest. In KDD. Google Scholar
Digital Library
- [26] . 2015. Image retrieval using scene graphs. In CVPR.Google Scholar
- [27] . 2015. Deep visual-semantic alignments for generating image descriptions. In CVPR. Google Scholar
Digital Library
- [28] . 2015. Where to buy it: Matching street clothing photos in online shops. In ICCV. Google Scholar
Digital Library
- [29] . 2014. Hipster wars: Discovering elements of fashion styles. In ECCV.Google Scholar
- [30] . 2015. Ranking and retrieval of image sequences from multiple paragraph queries. In CVPR.Google Scholar
- [31] . 2015. Adam: A method for stochastic optimization. In ICLR.Google Scholar
- [32] . 2012. WhittleSearch: Image search with relative attribute feedback. In CVPR. Google Scholar
Digital Library
- [33] . 2013. 3D object representations for fine-grained categorization. In ICCVw. Google Scholar
Digital Library
- [34] . 2011. Describable visual attributes for face verification and image search. IEEE Trans. Pattern Anal. Mach. Intell. 33, 10 (2011). Google Scholar
Digital Library
- [35] . 2014. Attribute-based classification for zero-shot visual object categorization. IEEE Trans. Pattern Anal. Mach. Intell. 36, 3 (2014). Google Scholar
Digital Library
- [36] . 2012. Image retrieval with structured object queries using latent ranking SVM. In ECCV. Google Scholar
Digital Library
- [37] . 2018. Stacked cross attention for image-text matching. In ECCV.Google Scholar
- [38] . 2012. Street-to-shop: Cross-scenario clothing retrieval via parts alignment and auxiliary set. In CVPR. Google Scholar
Digital Library
- [39] . 2017. Sphereface: Deep hypersphere embedding for face recognition. In CVPR.Google Scholar
- [40] . 2016. DeepFashion: Powering robust clothes recognition and retrieval with rich annotations. In CVPR.Google Scholar
- [41] . 2017. SGDR: Stochastic gradient descent with warm restarts. In ICLR.Google Scholar
- [42] . 2008. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, Nov. (2008).Google Scholar
- [43] . 2019. GeoStyle: Discovering fashion trends and events. In ICCV.Google Scholar
- [44] . 2008. Introduction to Information Retrieval. Cambridge University Press, New York, NY. Google Scholar
Digital Library
- [45] . 2015. Image-based recommendations on styles and substitutes. In ACM SIGIR. Google Scholar
Digital Library
- [46] . 2017. No fuss distance metric learning using proxies. In ICCV.Google Scholar
- [47] . 2011. Relative attributes. In ICCV. Google Scholar
Digital Library
- [48] . 2019. PyTorch: An imperative style, high-performance deep learning library. In NeurIPS. Google Scholar
Digital Library
- [49] . 2013. Multi-attribute queries: To merge or not to merge? In CVPR. Google Scholar
Digital Library
- [50] . 2018. Show me a story: Towards coherent neural story illustration. In CVPR.Google Scholar
- [51] . 2019. Sentence-BERT: Sentence embeddings using siamese BERT-networks. In EMNLP.Google Scholar
- [52] . 1978. Principles of categorization. Cognition and Categorization, Lawrence Erlbaum (Ed.) (1978).Google Scholar
- [53] . 2019. Latent multi-task architecture learning. In AAAI, Vol. 33. Google Scholar
Digital Library
- [54] . 2015. ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 3 (2015). Google Scholar
Digital Library
- [55] . 2011. Recognition using visual phrases. In CVPR.Google Scholar
- [56] . 2012. Multi-attribute spaces: Calibration for attribute fusion and similarity search. In CVPR.Google Scholar
- [57] . 2015. FaceNet: A unified embedding for face recognition and clustering. In CVPR.Google Scholar
- [58] . 2011. Image ranking and retrieval based on multi-attribute queries. In CVPR. Google Scholar
Digital Library
- [59] . 2017. Prototypical networks for few-shot learning. In NeurIPS. Google Scholar
Digital Library
- [60] . 2016. Deep metric learning via lifted structured feature embedding. In CVPR.Google Scholar
- [61] . 1981. Analyzing fashion life cycles—Principles and perspectives. J. Market. 45, 4 (1981), 116–124.Google Scholar
- [62] . 2019. Many task learning with task routing. In ICCV.Google Scholar
- [63] . 2006. The effect of strategic and tactical cause-related marketing on consumers’ brand loyalty. J. Consum. Market. 23, 1 (2006).Google Scholar
Cross Ref
- [64] . 2017. Conditional similarity networks. In CVPR.Google Scholar
- [65] . 2015. Learning visual clothing style with heterogeneous dyadic co-occurrences. In ICCV. Google Scholar
Digital Library
- [66] . 2019. Composing text and image for image retrieval—An empirical odyssey. In CVPR.Google Scholar
- [67] . 2016. Learning to learn: Model regression networks for easy small sample learning. In ECCV.Google Scholar
- [68] . 2019. Camp: Cross-modal adaptive message passing for text-image retrieval. In ICCV.Google Scholar
- [69] . 2009. Distance metric learning for large margin nearest neighbor classification. J. Mach. Learn. Res. 10 (2009). Google Scholar
Digital Library
- [70] . 2018. A broad-coverage challenge corpus for sentence understanding through inference. In NAACL.Google Scholar
- [71] . 2017. Visual search at eBay. In KDD. Google Scholar
Digital Library
- [72] . 2015. A large-scale car dataset for fine-grained categorization and verification. In CVPR.Google Scholar
- [73] . 2018. Learning semantic segmentation with diverse supervision. In WACV.Google Scholar
- [74] . 2014. Fine-grained visual comparisons with local learning. In CVPR. Google Scholar
Digital Library
- [75] . 2018. Improving the annotation of DeepFashion images for fine-grained attribute recognition. arXiv:1807.11674 (2018).Google Scholar
- [76] . 2017. Visual discovery at Pinterest. In WWW. Google Scholar
Digital Library
- [77] . 2019. Classification is a strong baseline for deep metric learning. In BMVC.Google Scholar
- [78] . 2019. Learning a unified embedding for visual search at Pinterest. In KDD. Google Scholar
Digital Library
- [79] . 2018. Visual search at Alibaba. In KDD. Google Scholar
Digital Library
- [80] . 2017. Memory-augmented attribute manipulation networks for interactive fashion search. In CVPR.Google Scholar
- [81] . 2018. A modulation module for multi-task learning with applications in image retrieval. In ECCV.Google Scholar
- [82] . 2017. Be your own Prada: Fashion synthesis with structural coherence. In ICCV.Google Scholar
Index Terms
Diversely-Supervised Visual Product Search
Recommendations
Regularized semi-supervised latent dirichlet allocation for visual concept learning
MMM'11: Proceedings of the 17th international conference on Advances in multimedia modeling - Volume Part ITopic models are a popular tool for visual concept learning. Current topic models are either unsupervised or fully supervised. Although lots of labeled images can significantly improve the performance of topic models, they are very costly to acquire. ...
Semi-supervised Hashing with Semantic Confidence for Large Scale Visual Search
SIGIR '15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information RetrievalSimilarity search is one of the fundamental problems for large scale multimedia applications. Hashing techniques, as one popular strategy, have been intensively investigated owing to the speed and memory efficiency. Recent research has shown that ...






Comments