Abstract
In modern society, people tend to prefer fashionable and decent outfits that can meet more than basic physiological needs. In fact, a proper outfit usually relies on good matching among complementary fashion items (e.g., the top, bottom, and shoes) that compose it, which thus propels us to investigate the automatic complementary clothing matching scheme. However, this is non-trivial due to the following challenges. First, the main challenge lies in how to accurately model the compatibility between complementary fashion items (e.g., the top and bottom) that come from the heterogeneous spaces with multi-modalities (e.g., the visual modality and textual modality). Second, since different features (e.g., the color, style, and pattern) of fashion items may contribute differently to compatibility modeling, how to encode the confidence of different pairwise features presents a tough challenge. Third, how to jointly learn the latent representation of multi-modal data and the compatibility between complementary fashion items contributes to the last challenge. Toward this end, in this work, we present an end-to-end attention-based neural framework for the compatibility modeling, where we introduce a feature-level attention model to adaptively learn the confidence for different pairwise features. Extensive experiments on a public available real-world dataset show the superiority of our model over state-of-the-art methods.
- Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv:1409.0473.Google Scholar
- Jingyuan Chen, Hanwang Zhang, Xiangnan He, Liqiang Nie, Wei Liu, and Tat-Seng Chua. 2017. Attentive collaborative filtering: Multimedia recommendation with item-and component-level attention. In Proceedings of the ACM International Conference on Research and Development in Information Retrieval. 335--344.Google Scholar
Digital Library
- Long Chen, Hanwang Zhang, Jun Xiao, Liqiang Nie, Jian Shao, Wei Liu, and Tat-Seng Chua. 2017. SCA-CNN: Spatial and channel-wise attention in convolutional networks for image captioning. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 6298--6306.Google Scholar
Cross Ref
- Marcella Cornia, Lorenzo Baraldi, Giuseppe Serra, and Rita Cucchiara. 2018. Paying more attention to saliency: Image captioning with saliency and context attention. ACM Transactions on Multimedia Computing, Communications, and Applications 14, 2 (2018), 48.Google Scholar
Digital Library
- Xin Dong, Lei Yu, Zhonghuo Wu, Yuxia Sun, Lingfeng Yuan, and Fangxi Zhang. 2017. A hybrid collaborative filtering model with deep structure for recommender systems. In Proceedings of the AAAI International Conference on Artificial Intelligence. 1309--1315.Google Scholar
Cross Ref
- Xintong Han, Zuxuan Wu, Yu-Gang Jiang, and Larry S. Davis. 2017. Learning fashion compatibility with bidirectional LSTMS. In Proceedings of the ACM International Conference on Multimedia. 1078--1086.Google Scholar
- Jing He, Xin Li, Lejian Liao, Dandan Song, and William K. Cheung. 2016. Inferring a personalized next point-of-interest recommendation model with latent behavior patterns. In Proceedings of the AAAI International Conference on Artificial Intelligence. 137--143.Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. 770--778.Google Scholar
Cross Ref
- Ruining He and Julian McAuley. 2016. VBPR: Visual Bayesian personalized ranking from implicit feedback. In Proceedings of the AAAI International Conference on Artificial Intelligence. 144--150.Google Scholar
Cross Ref
- Xiangteng He and Yuxin Peng. 2017. Weakly supervised learning of part selection model with spatial constraints for fine-grained image classification. In Proceedings of the AAAI International Conference on Artificial Intelligence. 4075--4081.Google Scholar
Cross Ref
- Yang Hu, Xi Yi, and Larry S. Davis. 2015. Collaborative fashion recommendation: A functional tensor factorization approach. In Proceedings of the ACM International Conference on Multimedia. 129--138.Google Scholar
- Tomoharu Iwata, Shinji Wanatabe, and Hiroshi Sawada. 2011. Fashion coordinates recommender system using photographs from fashion magazines. In Proceedings of the International Joint Conference on Artificial Intelligence, Vol. 1. 2.Google Scholar
- Vignesh Jagadeesh, Robinson Piramuthu, Anurag Bhardwaj, Wei Di, and Neel Sundaresan. 2014. Large scale visual recommendations from street fashion images. In Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining. 1925--1934.Google Scholar
Digital Library
- Shuhui Jiang, Yue Wu, and Yun Fu. 2018. Deep bidirectional cross-triplet embedding for online clothing shopping. ACM Transactions on Multimedia Computing, Communications, and Applications 14, 1 (2018), Article 5.Google Scholar
Digital Library
- Yu-Gang Jiang, Minjun Li, Xi Wang, Wei Liu, and Xian-Sheng Hua. 2018. DeepProduct: Mobile product search with portable deep features. ACM Transactions on Multimedia Computing, Communications, and Applications 14, 2 (2018), 50.Google Scholar
Digital Library
- Yoon Kim. 2014. Convolutional neural networks for sentence classification. arXiv:1408.5882.Google Scholar
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of the International Conference on Neural Information Processing Systems. 1097--1105.Google Scholar
Digital Library
- Jing Li, Pengjie Ren, Zhumin Chen, Zhaochun Ren, Tao Lian, and Jun Ma. 2017. Neural attentive session-based recommendation. In Proceedings of the ACM International Conference on Information and Knowledge Management. 1419--1428.Google Scholar
Digital Library
- Xuelong Li, Lichao Mou, and Xiaoqiang Lu. 2016. Semantic video parsing by combining frame relevance and label propagation from images. Multimedia Tools and Applications 75, 19 (2016), 11961--11976.Google Scholar
Digital Library
- Xuelong Li, Bin Zhao, and Xiaoqiang Lu. 2017. A general framework for edited video and raw video summarization. IEEE Transactions on Image Processing 26, 8 (2017), 3652--3664.Google Scholar
Digital Library
- X. Li, B. Zhao, and X. Lu. 2017. Key frame extraction in the summary space. IEEE Transactions on Cybernetics 48, 6 (2017), 1923--1934.Google Scholar
Cross Ref
- Yuncheng Li, Liangliang Cao, Jiang Zhu, and Jiebo Luo. 2017. Mining fashion outfit composition using an end-to-end deep learning approach on set data. IEEE Transactions on Multimedia 19, 8 (2017), 1946--1955.Google Scholar
Digital Library
- Chunze Lin, Jiwen Lu, Gang Wang, and Jie Zhou. 2018. Graininess-aware deep feature learning for pedestrian detection. In Proceedings of the European Conference on Computer Vision. 732--747.Google Scholar
Digital Library
- Shaohui Lin, Rongrong Ji, Chao Chen, and Feiyue Huang. 2017. ESPACE: Accelerating convolutional neural networks via eliminating spatial and channel redundancy. In Proceedings of the AAAI International Conference on Artificial Intelligence. 1424--1430.Google Scholar
Cross Ref
- Yujie Lin, Pengjie Ren, Zhumin Chen, Zhaochun Ren, Jun Ma, and Maarten de Rijke. 2018. Explainable fashion recommendation with joint outfit matching and comment generation. arXiv:1806.08977.Google Scholar
- Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio. 2017. A structured self-attentive sentence embedding. arXiv:1703.03130.Google Scholar
- Chenxi Liu, Junhua Mao, Fei Sha, and Alan L. Yuille. 2017. Attention correctness in neural image captioning. In Proceedings of the AAAI International Conference on Artificial Intelligence. 4176--4182.Google Scholar
- Si Liu, Jiashi Feng, Zheng Song, Tianzhu Zhang, Hanqing Lu, Changsheng Xu, and Shuicheng Yan. 2012. Hi, magic closet, tell me what to wear! In Proceedings of the ACM International Conference on Multimedia. 619--628.Google Scholar
- Yong Liu, Peilin Zhao, Aixin Sun, and Chunyan Miao. 2015. A boosting algorithm for item recommendation with implicit feedback. In Proceedings of the International Joint Conference on Artificial Intelligence, Vol. 15. 1792--1798.Google Scholar
- Ziwei Liu, Ping Luo, Shi Qiu, Xiaogang Wang, and Xiaoou Tang. 2016. DeepFashion: Powering robust clothes recognition and retrieval with rich annotations. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. 1096--1104.Google Scholar
Cross Ref
- Yihui Ma, Jia Jia, Suping Zhou, Jingtian Fu, Yejun Liu, and Zijian Tong. 2017. Towards better understanding the clothing fashion styles: A multimodal deep learning approach. In Proceedings of the AAAI International Conference on Artificial Intelligence. 38--44.Google Scholar
Cross Ref
- Julian McAuley, Christopher Targett, Qinfeng Shi, and Anton Van Den Hengel. 2015. Image-based recommendations on styles and substitutes. In Proceedings of the ACM International Conference on Research and Development in Information Retrieval. 43--52.Google Scholar
Digital Library
- Xiongkuo Min, Guangtao Zhai, Ke Gu, and Xiaokang Yang. 2017. Fixation prediction through multimodal analysis. ACM Transactions on Multimedia Computing, Communications, and Applications 13, 1 (2017), 6.Google Scholar
Digital Library
- Volodymyr Mnih, Nicolas Heess, Alex Graves, and Koray Kavukcuoglu. 2014. Recurrent models of visual attention. In Proceedings of the International Conference on Neural Information Processing Systems. 2204--2212.Google Scholar
- Takuma Nakamura and Ryosuke Goto. 2018. Outfit generation and style extraction via bidirectional LSTM and autoencoder. arXiv:1807.03133.Google Scholar
- Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Shengxian Wan, and Xueqi Cheng. 2016. Text matching as image recognition. In Proceedings of the AAAI International Conference on Artificial Intelligence. 2793--2799.Google Scholar
Cross Ref
- Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian personalized ranking from implicit feedback. In Proceedings of the International Conference on Uncertainty in Artificial Intelligence. 452--461.Google Scholar
- Sungyong Seo, Jing Huang, Hao Yang, and Yan Liu. 2017. Interpretable convolutional neural networks with dual local and global attention for review rating prediction. In Proceedings of the ACM International Conference on Recommender Systems. 297--305.Google Scholar
Digital Library
- Yong-Siang Shih, Kai-Yueh Chang, Hsuan-Tien Lin, and Min Sun. 2017. Compatibility family learning for item recommendation and generation. arXiv:1712.01262.Google Scholar
- Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556.Google Scholar
- Sijie Song, Cuiling Lan, Junliang Xing, Wenjun Zeng, and Jiaying Liu. 2017. An end-to-end spatio-temporal attention model for human action recognition from skeleton data. In Proceedings of the AAAI International Conference on Artificial Intelligence, Vol. 1. 4263--4270.Google Scholar
Cross Ref
- Xuemeng Song, Fuli Feng, Xianjing Han, Xin Yang, Wei Liu, and Liqiang Nie. 2018. Neural compatibility modeling with attentive knowledge distillation. arXiv:1805.00313.Google Scholar
- Xuemeng Song, Fuli Feng, Jinhuan Liu, Zekun Li, Liqiang Nie, and Jun Ma. 2017. NeuroStylist: Neural compatibility modeling for clothing matching. In Proceedings of the ACM International Conference on Multimedia. 753--761.Google Scholar
Digital Library
- Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A. Alemi. 2017. Inception-v4, inception-ResNet and the impact of residual connections on learning. In Proceedings of the AAAI International Conference on Artificial Intelligence, Vol. 4. 12.Google Scholar
- Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. 1--9.Google Scholar
Cross Ref
- Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 2818--2826.Google Scholar
Cross Ref
- Anran Wang, Jianfei Cai, Jiwen Lu, and Tat-Jen Cham. 2018. Structure-aware multimodal feature fusion for RGB-D scene classification and beyond. ACM Transactions on Multimedia Computing, Communications, and Applications 14, 2s (2018), Article 39.Google Scholar
Digital Library
- Liwei Wang, Yin Li, and Svetlana Lazebnik. 2016. Learning deep structure-preserving image-text embeddings. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. 5005--5013.Google Scholar
Cross Ref
- Chenyan Xiong, Jimie Callan, and Tie-Yen Liu. 2017. Learning to attend and to rank with word-entity duets. In Proceedings of the ACM International Conference on Research and Development in Information Retrieval, Vol. 763. 772.Google Scholar
- Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. In Proceedings of the International Conference on Machine Learning. 2048--2057.Google Scholar
Digital Library
Index Terms
An End-to-End Attention-Based Neural Model for Complementary Clothing Matching
Recommendations
Learning compatibility knowledge for outfit recommendation with complementary clothing matching
AbstractWith the rapid development of mobile networks and e-commerce, clothing recommendation has achieved considerable success in recent years. Fashion outfit matching has become an essential component to users while shopping, which helps ...
Attribute-aware explainable complementary clothing recommendation
AbstractModelling mix-and-match relationships among fashion items has become increasingly demanding yet challenging for modern E-commerce recommender systems. When performing clothes matching, most existing approaches leverage the latent visual features ...
Quality-aware neural complementary item recommendation
RecSys '18: Proceedings of the 12th ACM Conference on Recommender SystemsComplementary item recommendation finds products that go well with one another (e.g., a camera and a specific lens). While complementary items are ubiquitous, the dimensions by which items go together can vary by both product and category, making it ...






Comments