Abstract
With the boom of the fashion market and people’s daily needs for beauty, clothing matching has gained increased research attention. In a sense, tackling this problem lies in modeling the human notions of the compatibility between fashion items, i.e., Fashion Compatibility Modeling (FCM), which plays an important role in a wide bunch of commercial applications, including clothing recommendation and dressing assistant. Recent advances in multimedia processing have shown remarkable effectiveness in accurate compatibility evaluation. However, these studies work like a black box and cannot provide appropriate explanations, which are indeed of importance for gaining users’ trust and improving their experience. In fact, fashion experts usually explain the compatibility evaluation through the matching patterns between fashion attributes (e.g., a silk tank top cannot go with a knit dress). Inspired by this, we devise an attribute-wise explainable FCM solution, named ExFCM, which can simultaneously generate the item-level compatibility evaluation for input fashion items and the attribute-level explanations for the evaluation result. In particular, ExFCM consists of two key components: attribute-wise representation learning and attribute interaction modeling. The former works on learning the region-aware attribute representation for each item with the threshold global average pooling. Besides, the latter is responsible for compiling the attribute-level matching signals into the overall compatibility evaluation adaptively with the attentive interaction mechanism. Note that ExFCM is trained without any attribute-level compatibility annotations, which facilitates its practical applications. Extensive experiments on two real-world datasets validate that ExFCM can generate more accurate compatibility evaluations than the existing methods, together with reasonable explanations.
- Kenan E. Ak, Ashraf A. Kassim, Joo-Hwee Lim, and Jo Yew Tham. 2018. Learning attribute representations with localization for flexible fashion search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 7708–7717.Google Scholar
Cross Ref
- Da Cao, Liqiang Nie, Xiangnan He, Xiaochi Wei, Shunzhi Zhu, and Tat-Seng Chua. 2017. Embedding factorization models for jointly recommending items and user generated lists. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 585–594.Google Scholar
Digital Library
- Huizhong Chen, Andrew C. Gallagher, and Bernd Girod. 2012. Describing clothing by semantic attributes. In Proceedings of the European Conference on Computer Vision. Springer, 609–623.Google Scholar
Digital Library
- Peng Cui, Shaowei Liu, and Wenwu Zhu. 2018. General knowledge embedded image representation learning. IEEE Trans. Multimedia 20, 1 (2018), 198–207.Google Scholar
Digital Library
- Cunxiao Du, Zhaozheng Chin, Fuli Feng, Lei Zhu, Tian Gan, and Liqiang Nie. 2019. Explicit interaction model towards text classification. In Proceedings of the AAAI Conference on Artificial Intelligence. AAAI Press, 6359–6366.Google Scholar
Cross Ref
- Zunlei Feng, Zhenyun Yu, Yezhou Yang, Yongcheng Jing, Junxiao Jiang, and Mingli Song. 2018. Interpretable partitioned embedding for customized fashion outfit composition. In Proceedings of the ACM International Conference on Multimedia Retrieval. ACM, 143–151.Google Scholar
Digital Library
- Xintong Han, Zuxuan Wu, Phoenix X. Huang, Xiao Zhang, Menglong Zhu, Yuan Li, Yang Zhao, and Larry S. Davis. 2017. Automatic spatially-aware fashion concept discovery. In Proceedings of the IEEE International Conference on Computer Vision. IEEE, 1472–1480.Google Scholar
- Xintong Han, Zuxuan Wu, Yu-Gang Jiang, and Larry S. Davis. 2017. Learning fashion compatibility with bidirectional LSTMs. In Proceedings of the ACM International Conference on Multimedia. ACM, 1078–1086.Google Scholar
- Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural collaborative filtering. In Proceedings of the International Conference on World Wide Web. ACM, 173–182.Google Scholar
Digital Library
- Xiangnan He, Hanwang Zhang, Min-Yen Kan, and Tat-Seng Chua. 2016. Fast matrix factorization for online recommendation with implicit feedback. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 549–558.Google Scholar
Digital Library
- Yonghao He, Shiming Xiang, Cuicui Kang, Jian Wang, and Chunhong Pan. 2016. Cross-modal retrieval via deep and bidirectional representation learning. IEEE Trans. Multimedia 18, 7 (2016), 1363–1377.Google Scholar
Digital Library
- Wei-Lin Hsiao and Kristen Grauman. 2018. Creating capsule wardrobes from fashion images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 7161–7170.Google Scholar
Cross Ref
- Yang Hu, Xi Yi, and Larry S. Davis. 2015. Collaborative fashion recommendation: A functional tensor factorization approach. In Proceedings of the ACM International Conference on Multimedia. ACM, 129–138.Google Scholar
- Zhiting Hu, Xuezhe Ma, Zhengzhong Liu, Eduard H. Hovy, and Eric P. Xing. 2016. Harnessing deep neural networks with logic rules. In Proceedings of the Meeting of the Association for Computational Linguistics. The Association for Computer Linguistics, 2410–2420.Google Scholar
- Junshi Huang, Rogério Schmidt Feris, Qiang Chen, and Shuicheng Yan. 2015. Cross-domain image retrieval with a dual attribute-aware ranking network. In Proceedings of the IEEE International Conference on Computer Vision. IEEE, 1062–1070.Google Scholar
Digital Library
- Dong Li, Ting Yao, Ling-Yu Duan, Tao Mei, and Yong Rui. 2019. Unified spatio-temporal attention networks for action recognition in videos. IEEE Trans. Multimedia 21, 2 (2019), 416–428.Google Scholar
Digital Library
- Linghui Li, Sheng Tang, Yongdong Zhang, Lixi Deng, and Qi Tian. 2018. GLA: Global-local attention for image description. IEEE Trans. Multimedia 20, 3 (2018), 726–737.Google Scholar
Digital Library
- Yuncheng Li, Liangliang Cao, Jiang Zhu, and Jiebo Luo. 2017. Mining fashion outfit composition using an end-to-end deep learning approach on set data. IEEE Trans. Multimedia 19, 8 (2017), 1946–1955.Google Scholar
Digital Library
- Lizi Liao, Xiangnan He, Bo Zhao, Chong-Wah Ngo, and Tat-Seng Chua. 2018. Interpretable multimodal retrieval for fashion products. In Proceedings of the ACM International Conference on Multimedia. ACM, 1571–1579.Google Scholar
Digital Library
- Min Lin, Qiang Chen, and Shuicheng Yan. 2014. Network in network. In Proceedings of the International Conference on Learning Representations.Google Scholar
- Yujie Lin, Pengjie Ren, Zhumin Chen, Zhaochun Ren, Jun Ma, and Maarten de Rijke. 2019. Explainable fashion recommendation with joint outfit matching and comment generation. IEEE Trans. Knowl. Data Eng. 32, 8 (2019), 1502--1516.Google Scholar
Cross Ref
- Jinhuan Liu, Xuemeng Song, Zhumin Chen, and Jun Ma. 2019. Neural fashion experts: I know how to make the complementary clothing matching. Neurocomputing 359 (2019), 249–263.Google Scholar
Cross Ref
- Si Liu, Jiashi Feng, Zheng Song, Tianzhu Zhang, Hanqing Lu, Changsheng Xu, and Shuicheng Yan. 2012. Hi, magic closet, tell me what to wear! In Proceedings of the ACM International Conference on Multimedia. ACM, 619–628.Google Scholar
- Ziwei Liu, Ping Luo, Shi Qiu, Xiaogang Wang, and Xiaoou Tang. 2016. DeepFashion: Powering robust clothes recognition and retrieval with rich annotations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1096–1104.Google Scholar
Cross Ref
- Yi-Jie Lu, Linjun Yang, Kuiyuan Yang, and Yong Rui. 2015. Mining latent attributes from click-through logs for image recognition. IEEE Trans. Multimedia 17, 8 (2015), 1213–1224.Google Scholar
Cross Ref
- Lei Ma, Hongliang Li, Fanman Meng, Qingbo Wu, and King Ngi Ngan. 2017. Learning efficient binary codes from high-level feature representations for multilabel image retrieval. IEEE Trans. Multimedia 19, 11 (2017), 2545–2560.Google Scholar
Cross Ref
- Julian J. McAuley, Christopher Targett, Qinfeng Shi, and Anton van den Hengel. 2015. Image-based recommendations on styles and substitutes. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval. 43–52.Google Scholar
- Martin Mirakyan, Karen Hambardzumyan, and Hrant Khachatrian. 2018. Natural language inference over interaction space: ICLR 2018 reproducibility report. In Proceedings of the International Conference on Learning Representations.Google Scholar
- Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian personalized ranking from implicit feedback. In Proceedings of the International Conference on Uncertainty in Artificial Intelligence. AUAI Press, 452–461.Google Scholar
- Steffen Rendle and Lars Schmidt-Thieme. 2010. Pairwise interaction tensor factorization for personalized tag recommendation. In Proceedings of the Conference on Web Search and Web Data Mining, Brian D. Davison, Torsten Suel, Nick Craswell, and Bing Liu (Eds.). ACM, 81–90.Google Scholar
Digital Library
- Sijie Song and Tao Mei. 2018. When multimedia meets fashion. IEEE Trans. Multimedia 25, 3 (2018), 102–108.Google Scholar
Digital Library
- Xuemeng Song, Fuli Feng, Xianjing Han, Xin Yang, Wei Liu, and Liqiang Nie. 2018. Neural compatibility modeling with attentive knowledge distillation. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 5–14.Google Scholar
Digital Library
- Xuemeng Song, Fuli Feng, Jinhuan Liu, Zekun Li, Liqiang Nie, and Jun Ma. 2017. NeuroStylist: Neural compatibility modeling for clothing matching. In Proceedings of the ACM International Conference on Multimedia. ACM, 753–761.Google Scholar
Digital Library
- Xuemeng Song, Xianjing Han, Yunkai Li, Jingyuan Chen, Xin-Shun Xu, and Liqiang Nie. 2019. GP-BPR: Personalized compatibility modeling for clothing matching. In Proceedings of the ACM International Conference on Multimedia. ACM, 320–328.Google Scholar
Digital Library
- Guang-Lu Sun, Zhi-Qi Cheng, Xiao Wu, and Qiang Peng. 2018. Personalized clothing recommendation combining user social circle and fashion style consistency. Multimedia Tools Applic. 77, 14 (2018), 17731–17754.Google Scholar
Digital Library
- Pongsate Tangseng and Takayuki Okatani. 2020. Toward explainable fashion recommendation. In Proceedings of the Winter Conference on Applications of Computer Vision. IEEE, 2153–2162.Google Scholar
Cross Ref
- Nava Tintarev and Judith Masthoff. 2007. A survey of explanations in recommender systems. In Proceedings of the International Conference on Data Engineering Workshops. IEEE, 801–810.Google Scholar
Digital Library
- Mariya I. Vasileva, Bryan A. Plummer, Krishna Dusad, Shreya Rajpal, Ranjitha Kumar, and David A. Forsyth. 2018. Learning type-aware embeddings for fashion compatibility. In Proceedings of the European Conference on Computer Vision. Springer, 405–421.Google Scholar
- Cheng Wang, Haojin Yang, Christian Bartz, and Christoph Meinel. 2016. Image captioning with deep bidirectional LSTMs. In Proceedings of the ACM International Conference on Multimedia. ACM, 988–997.Google Scholar
Digital Library
- Qiurui Wang, Chun Yuan, Jingdong Wang, and Wenjun Zeng. 2019. Learning attentional recurrent neural network for visual tracking. IEEE Trans. Multimedia 21, 4 (2019), 930–942.Google Scholar
Cross Ref
- Shuohang Wang and Jing Jiang. 2016. Learning natural language inference with LSTM. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. The Association for Computational Linguistics, 1442–1451.Google Scholar
Cross Ref
- Xin Wang, Bo Wu, and Yueqi Zhong. 2019. Outfit compatibility prediction and diagnosis with multi-layered comparison network. In Proceedings of the ACM International Conference on Multimedia. ACM, 329–337.Google Scholar
Digital Library
- Yu Wu, Wei Wu, Chen Xing, Ming Zhou, and Zhoujun Li. 2017. Sequential matching network: A new architecture for multi-turn response selection in retrieval-based chatbots. In Proceedings of the 55th Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 496–505.Google Scholar
Cross Ref
- Jun Xiao, Hao Ye, Xiangnan He, Hanwang Zhang, Fei Wu, and Tat-Seng Chua. 2017. Attentional factorization machines: Learning the weight of feature interactions via attention networks. In Proceedings of the International Joint Conference on Artificial Intelligence. ijcai.org, 3119–3125.Google Scholar
Cross Ref
- Xun Yang, Yunshan Ma, Lizi Liao, Meng Wang, and Tat-Seng Chua. 2019. TransNFCM: Translation-based neural fashion compatibility modeling. In Proceedings of the AAAI Conference on Artificial Intelligence. AAAI Press, 403–410.Google Scholar
Cross Ref
- Xin Yang, Xuemeng Song, Xianjing Han, Haokun Wen, Jie Nie, and Liqiang Nie. 2020. Generative attribute manipulation scheme for flexible fashion search. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 941–950.Google Scholar
Digital Library
- Hanwang Zhang, Zheng-Jun Zha, Yang Yang, Shuicheng Yan, Yue Gao, and Tat-Seng Chua. 2013. Attribute-augmented semantic hierarchy: Towards bridging semantic gap and intention gap in image retrieval. In Proceedings of the ACM International Conference on Multimedia. ACM, 33–42.Google Scholar
Digital Library
- Yongfeng Zhang and Xu Chen. 2018. Explainable recommendation: A survey and new perspectives. arxiv:cs.IR/1804.11192.Google Scholar
- Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. 2016. Learning deep features for discriminative localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2921–2929.Google Scholar
Cross Ref
Index Terms
Attribute-wise Explainable Fashion Compatibility Modeling
Recommendations
Fashion Compatibility Modeling through a Multi-modal Try-on-guided Scheme
SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information RetrievalRecent years have witnessed a growing trend of fashion compatibility modeling, which scores the matching degree of the given outfit and then provides people with some dressing advice. Existing methods have primarily solved this problem by analyzing the ...
Fashion Meets Computer Vision: A Survey
Fashion is the way we present ourselves to the world and has become one of the world’s largest industries. Fashion, mainly conveyed by vision, has thus attracted much attention from computer vision researchers in recent years. Given the rapid ...
Attribute-aware heterogeneous graph network for fashion compatibility prediction
AbstractFashion compatibility prediction aims to provide a compatibility score for a set of fashion combinations, making an effort to meet people’s needs for clothing matching in daily life. One difficulty of this problem is that whether the ...






Comments