skip to main content
research-article

Diversely-Supervised Visual Product Search

Published:27 January 2022Publication History
Skip Abstract Section

Abstract

This article strives for a diversely supervised visual product search, where queries specify a diverse set of labels to search for. Where previous works have focused on representing attribute, instance, or category labels individually, we consider them together to create a diverse set of labels for visually describing products. We learn an embedding from the supervisory signal provided by every label to encode their interrelationships. Once trained, every label has a corresponding visual representation in the embedding space, which is an aggregation of selected items from the training set. At search time, composite query representations retrieve images that match a specific set of diverse labels. We form composite query representations by averaging over the aggregated representations of each diverse label in the specific set. For evaluation, we extend existing product datasets of cars and clothes with a diverse set of labels. Experiments show the benefits of our embedding for diversely supervised visual product search in seen and unseen product combinations and for discovering product design styles.

REFERENCES

  1. [1] Ak Kenan E., Kassim Ashraf A., Lim Joo Hwee, and Tham Jo Yew. 2018. Learning attribute representations with localization for flexible fashion search. In CVPR.Google ScholarGoogle Scholar
  2. [2] Ak Kenan E., Lim Joo Hwee, Tham Jo Yew, and Kassim Ashraf A.. 2019. Attribute manipulation generative adversarial networks for fashion images. In ICCV.Google ScholarGoogle Scholar
  3. [3] Akata Zeynep, Perronnin Florent, Harchaoui Zaid, and Schmid Cordelia. 2016. Label-embedding for image classification. IEEE Trans. Pattern Anal. Mach. Intell. 38, 7 (2016).Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Al-Halah Ziad, Stiefelhagen Rainer, and Grauman Kristen. 2017. Fashion forward: Forecasting visual style in fashion. In ICCV.Google ScholarGoogle Scholar
  5. [5] Bell Sean and Bala Kavita. 2015. Learning visual similarity for product design with convolutional neural networks. ACM Trans. Graph. 34, 4 (2015). Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Bergamo Alessandro, Torresani Lorenzo, and Fitzgibbon Andrew W.. 2011. PiCoDes: Learning a compact code for novel-category recognition. In NeurIPS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. [7] Bowman Samuel R., Angeli Gabor, Potts Christopher, and Manning Christopher D.. 2015. A large annotated corpus for learning natural language inference. In EMNLP.Google ScholarGoogle Scholar
  8. [8] Chaudhary Chandramani, Goyal Poonam, Goyal Navneet, and Chen Yi-Ping Phoebe. 2020. Image retrieval for complex queries using knowledge embedding. ACM Trans. Multim. Comput., Commun. Applic. 16, 1 (2020). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Chechik Gal, Shalit Uri, Sharma Varun, and Bengio Samy. 2009. An online algorithm for large scale image similarity learning. In NeurIPS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. [10] Chen Xinlei, Fan Haoqi, Girshick Ross, and He Kaiming. 2020. Improved baselines with momentum contrastive learning. In arXiv:2003.04297.Google ScholarGoogle Scholar
  11. [11] Chopra Sumit, Hadsell Raia, and LeCun Yann. 2005. Learning a similarity metric discriminatively, with application to face verification. In CVPR. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Deselaers Thomas and Ferrari Vittorio. 2011. Visual and semantic similarity in ImageNet. In CVPR. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] Devlin Jacob, Chang Ming-Wei, Lee Kenton, and Toutanova Kristina. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL.Google ScholarGoogle Scholar
  14. [14] Farhadi Ali, Endres Ian, Hoiem Derek, and Forsyth David. 2009. Describing objects by their attributes. In CVPR.Google ScholarGoogle Scholar
  15. [15] Ferrari Vittorio and Zisserman Andrew. 2008. Learning visual attributes. In NeurIPS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. [16] Frome Andrea, Singer Yoram, and Malik Jitendra. 2007. Image retrieval and classification using local distance functions. In NeurIPS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. [17] Gordo Albert and Larlus Diane. 2017. Beyond instance-level image retrieval: Leveraging captions to learn a global visual representation for semantic retrieval. In CVPR.Google ScholarGoogle Scholar
  18. [18] Hadsell Raia, Chopra Sumit, and LeCun Yann. 2006. Dimensionality reduction by learning an invariant mapping. In CVPR. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] Han Xintong, Wu Zuxuan, Huang Phoenix X., Zhang Xiao, Zhu Menglong, Li Yuan, Zhao Yang, and Davis Larry S.. 2017. Automatic spatially aware fashion concept discovery. In ICCV.Google ScholarGoogle Scholar
  20. [20] He Kaiming, Fan Haoqi, Wu Yuxin, Xie Saining, and Girshick Ross. 2020. Momentum contrast for unsupervised visual representation learning. In CVPR.Google ScholarGoogle Scholar
  21. [21] He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2016. Deep residual learning for image recognition. In CVPR.Google ScholarGoogle Scholar
  22. [22] Hsiao Wei-Lin and Grauman Kristen. 2018. Creating capsule wardrobes from fashion images. In CVPR.Google ScholarGoogle Scholar
  23. [23] Huang Junshi, Feris Rogerio S., Chen Qiang, and Yan Shuicheng. 2015. Cross-domain image retrieval with a dual attribute-aware ranking network. In CVPR. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] Huang Junshi, Liu Si, Xing Junliang, Mei Tao, and Yan Shuicheng. 2014. Circle & search: Attribute-aware shoe retrieval. ACM Trans. Multim. Comput., Commun. Applic. 11, 1 (2014).Google ScholarGoogle Scholar
  25. [25] Jing Yushi, Liu David, Kislyuk Dmitry, Zhai Andrew, Xu Jiajing, Donahue Jeff, and Tavel Sarah. 2015. Visual search at Pinterest. In KDD. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. [26] Johnson Justin, Krishna Ranjay, Stark Michael, Li Li-Jia, Shamma David, Bernstein Michael, and Fei-Fei Li. 2015. Image retrieval using scene graphs. In CVPR.Google ScholarGoogle Scholar
  27. [27] Karpathy Andrej and Fei-Fei Li. 2015. Deep visual-semantic alignments for generating image descriptions. In CVPR. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] Kiapour M. Hadi, Han Xufeng, Lazebnik Svetlana, Berg Alexander C., and Berg Tamara L.. 2015. Where to buy it: Matching street clothing photos in online shops. In ICCV. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. [29] Kiapour M. Hadi, Yamaguchi Kota, Berg Alexander C., and Berg Tamara L.. 2014. Hipster wars: Discovering elements of fashion styles. In ECCV.Google ScholarGoogle Scholar
  30. [30] Kim Gunhee, Moon Seungwhan, and Sigal Leonid. 2015. Ranking and retrieval of image sequences from multiple paragraph queries. In CVPR.Google ScholarGoogle Scholar
  31. [31] Kingma Diederik P. and Ba Jimmy. 2015. Adam: A method for stochastic optimization. In ICLR.Google ScholarGoogle Scholar
  32. [32] Kovashka Adriana, Parikh Devi, and Grauman Kristen. 2012. WhittleSearch: Image search with relative attribute feedback. In CVPR. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. [33] Krause Jonathan, Stark Michael, Deng Jia, and Fei-Fei Li. 2013. 3D object representations for fine-grained categorization. In ICCVw. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. [34] Kumar Neeraj, Berg Alexander, Belhumeur Peter N., and Nayar Shree. 2011. Describable visual attributes for face verification and image search. IEEE Trans. Pattern Anal. Mach. Intell. 33, 10 (2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. [35] Lampert Christoph H., Nickisch Hannes, and Harmeling Stefan. 2014. Attribute-based classification for zero-shot visual object categorization. IEEE Trans. Pattern Anal. Mach. Intell. 36, 3 (2014). Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. [36] Lan Tian, Yang Weilong, Wang Yang, and Mori Greg. 2012. Image retrieval with structured object queries using latent ranking SVM. In ECCV. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. [37] Lee Kuang-Huei, Chen Xi, Hua Gang, Hu Houdong, and He Xiaodong. 2018. Stacked cross attention for image-text matching. In ECCV.Google ScholarGoogle Scholar
  38. [38] Liu Si, Song Zheng, Liu Guangcan, Xu Changsheng, Lu Hanqing, and Yan Shuicheng. 2012. Street-to-shop: Cross-scenario clothing retrieval via parts alignment and auxiliary set. In CVPR. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. [39] Liu Weiyang, Wen Yandong, Yu Zhiding, Li Ming, Raj Bhiksha, and Song Le. 2017. Sphereface: Deep hypersphere embedding for face recognition. In CVPR.Google ScholarGoogle Scholar
  40. [40] Liu Ziwei, Luo Ping, Qiu Shi, Wang Xiaogang, and Tang Xiaoou. 2016. DeepFashion: Powering robust clothes recognition and retrieval with rich annotations. In CVPR.Google ScholarGoogle Scholar
  41. [41] Loshchilov Ilya and Hutter Frank. 2017. SGDR: Stochastic gradient descent with warm restarts. In ICLR.Google ScholarGoogle Scholar
  42. [42] van der Maaten Laurens and Hinton Geoffrey. 2008. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, Nov. (2008).Google ScholarGoogle Scholar
  43. [43] Mall Utkarsh, Matzen Kevin, Hariharan Bharath, Snavely Noah, and Bala Kavita. 2019. GeoStyle: Discovering fashion trends and events. In ICCV.Google ScholarGoogle Scholar
  44. [44] Manning Christopher D., Raghavan Prabhakar, and Schütze Hinrich. 2008. Introduction to Information Retrieval. Cambridge University Press, New York, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. [45] McAuley Julian, Targett Christopher, Shi Qinfeng, and Van Den Hengel Anton. 2015. Image-based recommendations on styles and substitutes. In ACM SIGIR. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. [46] Movshovitz-Attias Yair, Toshev Alexander, Leung Thomas K., Ioffe Sergey, and Singh Saurabh. 2017. No fuss distance metric learning using proxies. In ICCV.Google ScholarGoogle Scholar
  47. [47] Parikh Devi and Grauman Kristen. 2011. Relative attributes. In ICCV. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. [48] Paszke Adam, Gross Sam, Massa Francisco, Lerer Adam, Bradbury James, Chanan Gregory, Killeen Trevor, Lin Zeming, Gimelshein Natalia, Antiga Luca, Desmaison Alban, Kopf Andreas, Yang Edward, DeVito Zachary, Raison Martin, Tejani Alykhan, Chilamkurthy Sasank, Steiner Benoit, Fang Lu, Bai Junjie, and Chintala Soumith. 2019. PyTorch: An imperative style, high-performance deep learning library. In NeurIPS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. [49] Rastegari Mohammad, Diba Ali, Parikh Devi, and Farhadi Ali. 2013. Multi-attribute queries: To merge or not to merge? In CVPR. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. [50] Ravi Hareesh, Wang Lezi, Muniz Carlos, Sigal Leonid, Metaxas Dimitris, and Kapadia Mubbasir. 2018. Show me a story: Towards coherent neural story illustration. In CVPR.Google ScholarGoogle Scholar
  51. [51] Reimers Nils and Gurevych Iryna. 2019. Sentence-BERT: Sentence embeddings using siamese BERT-networks. In EMNLP.Google ScholarGoogle Scholar
  52. [52] Rosch E. H.. 1978. Principles of categorization. Cognition and Categorization, Lawrence Erlbaum (Ed.) (1978).Google ScholarGoogle Scholar
  53. [53] Ruder Sebastian, Bingel Joachim, Augenstein Isabelle, and Søgaard Anders. 2019. Latent multi-task architecture learning. In AAAI, Vol. 33. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. [54] Russakovsky Olga, Deng Jia, Su Hao, Krause Jonathan, Satheesh Sanjeev, Ma Sean, Huang Zhiheng, Karpathy Andrej, Khosla Aditya, Bernstein Michael, Berg Alexander C., and Fei-Fei Li. 2015. ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 3 (2015). Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. [55] Sadeghi Mohammad Amin and Farhadi Ali. 2011. Recognition using visual phrases. In CVPR.Google ScholarGoogle Scholar
  56. [56] Scheirer Walter J., Kumar Neeraj, Belhumeur Peter N., and Boult Terrance E.. 2012. Multi-attribute spaces: Calibration for attribute fusion and similarity search. In CVPR.Google ScholarGoogle Scholar
  57. [57] Schroff Florian, Kalenichenko Dmitry, and Philbin James. 2015. FaceNet: A unified embedding for face recognition and clustering. In CVPR.Google ScholarGoogle Scholar
  58. [58] Siddiquie Behjat, Feris Rogerio S., and Davis Larry S.. 2011. Image ranking and retrieval based on multi-attribute queries. In CVPR. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. [59] Snell Jake, Swersky Kevin, and Zemel Richard. 2017. Prototypical networks for few-shot learning. In NeurIPS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. [60] Song Hyun Oh, Xiang Yu, Jegelka Stefanie, and Savarese Silvio. 2016. Deep metric learning via lifted structured feature embedding. In CVPR.Google ScholarGoogle Scholar
  61. [61] Sproles George B.. 1981. Analyzing fashion life cycles—Principles and perspectives. J. Market. 45, 4 (1981), 116124.Google ScholarGoogle Scholar
  62. [62] Strezoski Gjorgji, van Noord Nanne, and Worring Marcel. 2019. Many task learning with task routing. In ICCV.Google ScholarGoogle Scholar
  63. [63] Van den Brink Douwe, Odekerken-Schröder Gaby, and Pauwels Pieter. 2006. The effect of strategic and tactical cause-related marketing on consumers’ brand loyalty. J. Consum. Market. 23, 1 (2006).Google ScholarGoogle ScholarCross RefCross Ref
  64. [64] Veit Andreas, Belongie Serge, and Karaletsos Theofanis. 2017. Conditional similarity networks. In CVPR.Google ScholarGoogle Scholar
  65. [65] Veit Andreas, Kovacs Balazs, Bell Sean, McAuley Julian, Bala Kavita, and Belongie Serge. 2015. Learning visual clothing style with heterogeneous dyadic co-occurrences. In ICCV. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. [66] Vo Nam, Jiang Lu, Sun Chen, Murphy Kevin, Li Li-Jia, Fei-Fei Li, and Hays James. 2019. Composing text and image for image retrieval—An empirical odyssey. In CVPR.Google ScholarGoogle Scholar
  67. [67] Wang Yu-Xiong and Hebert Martial. 2016. Learning to learn: Model regression networks for easy small sample learning. In ECCV.Google ScholarGoogle Scholar
  68. [68] Wang Zihao, Liu Xihui, Li Hongsheng, Sheng Lu, Yan Junjie, Wang Xiaogang, and Shao Jing. 2019. Camp: Cross-modal adaptive message passing for text-image retrieval. In ICCV.Google ScholarGoogle Scholar
  69. [69] Weinberger Kilian Q. and Saul Lawrence K.. 2009. Distance metric learning for large margin nearest neighbor classification. J. Mach. Learn. Res. 10 (2009). Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. [70] Williams Adina, Nangia Nikita, and Bowman Samuel R.. 2018. A broad-coverage challenge corpus for sentence understanding through inference. In NAACL.Google ScholarGoogle Scholar
  71. [71] Yang Fan, Kale Ajinkya, Bubnov Yury, Stein Leon, Wang Qiaosong, Kiapour Hadi, and Piramuthu Robinson. 2017. Visual search at eBay. In KDD. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. [72] Yang L., Luo P., Loy C. C., and Tang X.. 2015. A large-scale car dataset for fine-grained categorization and verification. In CVPR.Google ScholarGoogle Scholar
  73. [73] Ye Linwei, Liu Zhi, and Wang Yang. 2018. Learning semantic segmentation with diverse supervision. In WACV.Google ScholarGoogle Scholar
  74. [74] Yu A. and Grauman K.. 2014. Fine-grained visual comparisons with local learning. In CVPR. Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. [75] Zakizadeh R., Sasdelli M., Qian Y., and Vazquez E.. 2018. Improving the annotation of DeepFashion images for fine-grained attribute recognition. arXiv:1807.11674 (2018).Google ScholarGoogle Scholar
  76. [76] Zhai Andrew, Kislyuk Dmitry, Jing Yushi, Feng Michael, Tzeng Eric, Donahue Jeff, Du Yue Li, and Darrell Trevor. 2017. Visual discovery at Pinterest. In WWW. Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. [77] Zhai Andrew and Wu Hao-Yu. 2019. Classification is a strong baseline for deep metric learning. In BMVC.Google ScholarGoogle Scholar
  78. [78] Zhai Andrew, Wu Hao-Yu, Tzeng Eric, Park Dong Huk, and Rosenberg Charles. 2019. Learning a unified embedding for visual search at Pinterest. In KDD. Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. [79] Zhang Yanhao, Pan Pan, Zheng Yun, Zhao Kang, Zhang Yingya, Ren Xiaofeng, and Jin Rong. 2018. Visual search at Alibaba. In KDD. Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. [80] Zhao Bo, Feng Jiashi, Wu Xiao, and Yan Shuicheng. 2017. Memory-augmented attribute manipulation networks for interactive fashion search. In CVPR.Google ScholarGoogle Scholar
  81. [81] Zhao Xiangyun, Li Haoxiang, Shen Xiaohui, Liang Xiaodan, and Wu Ying. 2018. A modulation module for multi-task learning with applications in image retrieval. In ECCV.Google ScholarGoogle Scholar
  82. [82] Zhu Shizhan, Urtasun Raquel, Fidler Sanja, Lin Dahua, and Loy Chen Change. 2017. Be your own Prada: Fashion synthesis with structural coherence. In ICCV.Google ScholarGoogle Scholar

Index Terms

  1. Diversely-Supervised Visual Product Search

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 18, Issue 1
      January 2022
      517 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/3505205
      Issue’s Table of Contents

      Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 27 January 2022
      • Revised: 1 April 2021
      • Accepted: 1 April 2021
      • Received: 1 July 2020
      Published in tomm Volume 18, Issue 1

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Refereed
    • Article Metrics

      • Downloads (Last 12 months)48
      • Downloads (Last 6 weeks)0

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!