Abstract
Visual sentiment analysis is attracting increasing attention with the rapidly growing amount of images uploaded to social networks. Learning rich visual representations often requires training deep convolutional neural networks (CNNs) on massive manually labeled data, which is expensive or scarce especially for a subjective task like visual sentiment analysis. Meanwhile, a large quantity of social images is quite available yet noisy by querying social networks using the sentiment categories as keywords, where various types of images related to the specific sentiment can be easily collected. In this article, we propose a multiple kernel network for visual sentiment recognition, which learns representation from strongly- and weakly supervised CNNs. Specifically, the weakly supervised deep model is trained using the large-scale data from social images, whereas the strongly supervised deep model is fine tuned on the affecitve datasets with manual annotation. We employ the multiple kernel scheme on the multiple layers of CNNs, which can automatically select the discriminative representation by learning a linear combination from a set of pre-defined kernels. In addition, we introduce a large-scale dataset collected from popular comics of various countries, such as America, Japan, China, and France, which consists of 11,821 images with various artistic styles. Experimental results show that the multiple kernel network achieves consistent improvements over the state-of-the-art methods on the public affective datasets, as well as the newly established Comics dataset. The Comics dataset can be found at http://cv.nankai.edu.cn/projects/Comic.
- Unaiza Ahsan, Munmun De Choudhury, and Irfan A. Essa. 2017. Towards using visual attributes to infer image sentiment of social events. In Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN’17).Google Scholar
- Soraia M. Alarcao and Manuel J. Fonseca. 2018. Identifying emotions in images from valence and arousal ratings. Multimedia Tools and Applications 77, 13 (2018), 17413--17435.Google Scholar
Digital Library
- Damian Borth, Rongrong Ji, Tao Chen, Thomas Breuel, and Shih Fu Chang. 2013. Large-scale visual sentiment ontology and detectors using adjective noun pairs. In Proceedings of the ACM International Conference on Multimedia.Google Scholar
Digital Library
- Victor Campos, Brendan Jou, and Xavier Giró i Nieto. 2017. From pixels to sentiment: Fine-tuning CNNs for visual sentiment prediction. Image and Vision Computing 65 (2017), 15--22.Google Scholar
Digital Library
- Víctor Campos, Amaia Salvador, Xavier Giro-i Nieto, and Brendan Jou. 2015. Diving deep into sentiment: Understanding fine-tuned CNNs for visual sentiment prediction. In Proceedings of the International Workshop on Affect and Sentiment in Multimedia.Google Scholar
- Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 3 (2011), 1--27.Google Scholar
Digital Library
- Ming Chen, Lu Zhang, and Jan P. Allebach. 2015. Learning deep features for image emotion classification. In Proceedings of the IEEE International Conference on Image Processing.Google Scholar
- Tao Chen, Damian Borth, Trevor Darrell, and Shih-Fu Chang. 2014. DeepSentiBank: Visual sentiment concept classification with deep convolutional neural networks. arXiv:1410.8586.Google Scholar
- Tao Chen, Felix X. Yu, Jiawei Chen, Yin Cui, Yan Ying Chen, and Shih Fu Chang. 2014. Object-based visual sentiment concept analysis and application. In Proceedings of the ACM International Conference on Multimedia.Google Scholar
Digital Library
- Yan-Ying Chen, Tao Chen, Taikun Liu, Hong-Yuan Mark Liao, and Shih-Fu Chang. 2015. Assistive image comment robot—A novel mid-level concept-based representation. IEEE Transactions on Affective Computing 6, 3 (2015), 298--311.Google Scholar
Cross Ref
- Youngmin Cho and Lawrence K Saul. 2009. Kernel methods for deep learning. In Proceedings of the Annual Conference on Neural Information Processing Systems.Google Scholar
- Paul Ekman. 1992. An argument for basic emotions. Cognition 8 Emotion 6, 3–4 (1992), 169--200.Google Scholar
- Paul Ekman, Wallace V. Friesen, Maureen O’Sullivan, Anthony Chan, Irene Diacoyanni-Tarlatzis, Karl Heider, Rainer Krause, et al. 1987. Universals and cultural differences in the judgments of facial expressions of emotion. Journal of Personality and Social Psychology 53, 4 (1987), 712.Google Scholar
Cross Ref
- Shaojing Fan, Zhiqi Shen, Ming Jiang, Bryan L. Koenig, Juan Xu, Mohan S. Kankanhalli, and Qi Zhao. 2018. Emotional attention: A study of image sentiment and visual attention. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
Cross Ref
- Yang Gao, Oscar Beijbom, Ning Zhang, and Trevor Darrell. 2015. Compact bilinear pooling. arXiv:1511.06062.Google Scholar
- Yunchao Gong, Liwei Wang, Ruiqi Guo, and Svetlana Lazebnik. 2014. Multi-scale orderless pooling of deep convolutional activation features. In Proceedings of the European Conference on Computer Vision.Google Scholar
Cross Ref
- Alan Hanjalic. 2010. Extracting moods from pictures and sounds. IEEE Signal Processing Magazine 23, 2 (2010), 90--100.Google Scholar
Cross Ref
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2014. Spatial pyramid pooling in deep convolutional networks for visual recognition. In Proceedings of the European Conference on Computer Vision.Google Scholar
Cross Ref
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep residual learning for image recognition. arXiv:1512.03385.Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
Cross Ref
- Xuanyu He and Wei Zhang. 2018. Emotion recognition by assisted learning with convolutional neural networks. Neurocomputing 291 (2018), 187--194.Google Scholar
Cross Ref
- Jia Jia, Sen Wu, Xiaohui Wang, Peiyun Hu, Lianhong Cai, and Jie Tang. 2012. Can we understand Van Gogh’s mood? Learning to infer affects from images in social networks. In Proceedings of the ACM International Conference on Multimedia.Google Scholar
Digital Library
- Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the ACM International Conference on Multimedia.Google Scholar
Digital Library
- Mingyuan Jiu and Hichem Sahbi. 2015. Semi supervised deep kernel design for image annotation. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing.Google Scholar
Cross Ref
- Dhiraj Joshi, Ritendra Datta, Elena Fedorovskaya, Quang Tuan Luong, James Z. Wang, Jia Li, and Jiebo Luo. 2011. Aesthetics and emotions in images. IEEE Signal Processing Magazine 28, 5 (2011), 94--115.Google Scholar
Cross Ref
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of the Annual Conference on Neural Information Processing Systems.Google Scholar
Digital Library
- Xin Lu, Poonam Suryanarayan, Reginald B. Adams, Jia Li, Michelle G. Newman, and James Z. Wang. 2012. On shape and the computability of emotions. In Proceedings of the ACM International Conference on Multimedia.Google Scholar
- Xin Lu, Poonam Suryanarayan, Reginald B. Adams Jr., Jia Li, Michelle G. Newman, and James Z. Wang. 2012. On shape and the computability of emotions. In Proceedings of the ACM International Conference on Multimedia.Google Scholar
- Jana Machajdik and Allan Hanbury. 2010. Affective image classification using features inspired by psychology and art theory. In Proceedings of the ACM International Conference on Multimedia.Google Scholar
Digital Library
- Julien Mairal, Piotr Koniusz, Zaïd Harchaoui, and Cordelia Schmid. 2014. Convolutional kernel networks. In Proceedings of the Annual Conference on Neural Information Processing Systems.Google Scholar
- Scott McCloud. 2007. Making comics: Storytelling secrets of comics, manga and graphic novels. Journal of Popular Culture 40, 5 (2007), 890--892.Google Scholar
Cross Ref
- Joseph A. Mikels, Barbara L. Fredrickson, Gregory R. Larkin, Casey M. Lindberg, Sam J. Maglio, and Patricia A. Reuter-Lorenz. 2005. Emotional category data on images from the international affective picture system. Behavior Research Methods 37, 4 (2005), 626--630.Google Scholar
Cross Ref
- Aude Oliva and Antonio Torralba. 2001. Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision 42, 3 (2001), 145--175.Google Scholar
Digital Library
- Rameswar Panda, Jianming Zhang, Haoxiang Li, Joon-Young Lee, Xin Lu, and Amit K. Roy-Chowdhury. 2018. Contemplating visual emotions: Understanding and overcoming dataset bias. In Proceedings of the European Conference on Computer Vision.Google Scholar
- Kuan-Chuan Peng and Tsuhan Chen. 2015. Cross-layer features in convolutional neural networks for generic classification tasks. In Proceedings of the IEEE International Conference on Image Processing.Google Scholar
Cross Ref
- Kuan-Chuan Peng and Tsuhan Chen. 2015. A framework of extracting multi-scale features using multiple convolutional neural networks. In Proceedings of the IEEE International Conference on Multimedia and Expo.Google Scholar
Cross Ref
- Kuan-Chuan Peng, Tsuhan Chen, Amir Sadovnik, and Andrew C. Gallagher. 2015. A mixed bag of emotions: Model, predict, and transfer emotion distributions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
- Soujanya Poria, Haiyun Peng, Amir Hussain, Newton Howard, and Erik Cambria. 2017. Ensemble application of convolutional neural networks and multiple kernel learning for multimodal sentiment analysis. Neurocomputing 261 (2017), 217--230.Google Scholar
Cross Ref
- Alain Rakotomamonjy, Francis Bach, Stéphane Canu, and Yves Grandvalet. 2007. More efficiency in multiple kernel learning. In Proceedings of the International Conference on Machine Learning.Google Scholar
Digital Library
- Tianrong Rao, Min Xu, and Dong Xu. 2016. Learning multi-level deep representations for image emotion classification. arXiv:1611.07145.Google Scholar
- Shaoqing Ren, Kaiming He, Ross B. Girshick, and Jian Sun. 2017. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analalysis and Machine Intelligence 39, 6 (2017), 1137--1149.Google Scholar
Digital Library
- Andreza Sartori, Dubravko Culibrk, Yan Yan, and Nicu Sebe. 2015. Who’s afraid of Itten: Using the art theory of color combination to analyze emotions in abstract paintings. In Proceedings of the ACM International Conference on Multimedia.Google Scholar
Digital Library
- Evan Shelhamer, Jonathan Long, and Trevor Darrell. 2017. Fully convolutional networks for semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 4 (2017), 640--651.Google Scholar
Digital Library
- Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations.Google Scholar
- Kaikai Song, Ting Yao, Qiang Ling, and Tao Mei. 2018. Boosting image sentiment analysis with visual attention. Neurocomputing 312 (2018), 218--228.Google Scholar
Digital Library
- Sören Sonnenburg, Gunnar Rätsch, Christin Schäfer, and Bernhard Schölkopf. 2006. Large scale multiple kernel learning. Journal of Machine Learning Research 7, 7 (2006), 1531--1565.Google Scholar
Digital Library
- Ming Sun, Jufeng Yang, Kai Wang, and Hui Shen. 2016. Discovering affective regions in deep convolutional neural networks for visual sentiment prediction. In Proceedings of the IEEE International Conference on Multimedia and Expo.Google Scholar
Cross Ref
- Vladyslav Sydorov, Mayu Sakurada, and Christoph H. Lampert. 2014. Deep Fisher kernels—End to end learning of the Fisher kernel GMM parameters. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
- Quoc-Tuan Truong and Hady W. Lauw. 2017. Visual sentiment analysis for review images with item-oriented and user-oriented CNN. In Proceedings of the ACM International Conference on Multimedia.Google Scholar
- Wang Wei-Ning, Yu Ying-Lin, and Jiang Sheng-Ming. 2006. Image retrieval by emotional semantics: A study of emotional space and feature extraction. In Proceedings of the 2006 IEEE International Conference on Systems, Man, and Cybernetics (SMC’06).Google Scholar
Cross Ref
- Lifang Wu, Shuang Liu, Meng Jian, Jiebo Luo, Xiuzhen Zhang, and Mingchao Qi. 2017. Reducing noisy labels in weakly labeled data for visual sentiment analysis. In Proceedings of the IEEE International Conference on Image Processing.Google Scholar
Cross Ref
- Jufeng Yang, Dongyu She, Yu-Kun Lai, Paul L. Rosin, and Ming-Hsuan Yang. 2018. Weakly supervised coupled networks for visual sentiment analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
Cross Ref
- Jufeng Yang, Dongyu She, Yu-Kun Lai, and Ming-Hsuan Yang. 2018. Retrieving and classifying affective images via deep metric learning. In Proceedings of the AAAI Conference on Artificial Intelligence.Google Scholar
- Jufeng Yang, Dongyu She, and Ming Sun. 2017. Joint image emotion classification and distribution learning via deep convolutional neural network. In Proceedings of the International Joint Conference on Artificial Intelligence.Google Scholar
Cross Ref
- Jufeng Yang, Dongyu She, Ming Sun, Ming Ming Cheng, Paul Rosin, and Liang Wang. 2018. Visual sentiment prediction based on automatic discovery of affective regions. IEEE Transactions on Multimedia 20, 9 (2018), 2513--2525.Google Scholar
Digital Library
- Jufeng Yang, Ming Sun, and Xiaoxiao Sun. 2017. Learning visual sentiment distribution via augmented conditional probability neural network. In Proceedings of the AAAI Conference on Artificial Intelligence.Google Scholar
- Quanzeng You, Hailin Jin, and Jiebo Luo. 2017. Visual sentiment analysis by attending on local image regions. In Proceedings of the AAAI Conference on Artificial Intelligence.Google Scholar
- Quanzeng You, Jiebo Luo, Hailin Jin, and Jianchao Yang. 2015. Robust image sentiment analysis using progressively trained and domain transferred deep networks. In Proceedings of the AAAI Conference on Artificial Intelligence.Google Scholar
- Quanzeng You, Jiebo Luo, Hailin Jin, and Jianchao Yang. 2016. Building a large scale dataset for image emotion recognition: The fine print and the benchmark. In Proceedings of the AAAI Conference on Artificial Intelligence.Google Scholar
- Jianbo Yuan, Sean McDonough, Quanzeng You, and Jiebo Luo. 2013. Sentribute: Image sentiment analysis from a mid-level perspective. In Proceedings of the International Workshop on Issues of Sentiment Discovery and Opinion Mining.Google Scholar
Digital Library
- Matthew D. Zeiler and Rob Fergus. 2014. Visualizing and understanding convolutional networks. In Proceedings of the European Conference on Computer Vision.Google Scholar
- Lei Zhang, Shuai Wang, and Bing Liu. 2018. Deep learning for sentiment analysis: A survey. Wiley Interdiscipplinary Reviews: Data Mining and Knowledge Discovery 8, 4 (2018).Google Scholar
- Xiaopeng Zhang, Hongkai Xiong, Wengang Zhou, Weiyao Lin, and Qi Tian. 2016. Picking deep filter responses for fine-grained image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
Cross Ref
- Sicheng Zhao, Guiguang Ding, Yue Gao, and Jungong Han. 2017. Approximating discrete probability distribution of image emotions by multi-modal features fusion. In Proceedings of the International Joint Conference on Artificial Intelligence.Google Scholar
Cross Ref
- Sicheng Zhao, Guiguang Ding, Yue Gao, and Jungong Han. 2017. Learning visual emotion distributions via multi-modal features fusion. In Proceedings of the ACM International Conference on Multimedia.Google Scholar
Digital Library
- Sicheng Zhao, Guiguang Ding, Qingming Huang, Tat-Seng Chua, Björn W. Schuller, and Kurt Keutzer. 2018. Affective image content analysis: A comprehensive survey. In Proceedings of the International Joint Conference on Artificial Intelligence.Google Scholar
Cross Ref
- Sicheng Zhao, Yue Gao, Guiguang Ding, and Tat Seng Chua. 2017. Real-time multimedia social event detection in microblog. IEEE Transactions on Cybernetics 48, 11 (2017), 3218--3231.Google Scholar
Cross Ref
- Sicheng Zhao, Yue Gao, Xiaolei Jiang, Hongxun Yao, Tat Seng Chua, and Xiaoshuai Sun. 2014. Exploring principles-of-art features for image emotion recognition. In Proceedings of the ACM International Conference on Multimedia.Google Scholar
Digital Library
- Sicheng Zhao, Hongxun Yao, Yue Gao, Rongrong Ji, and Guiguang Ding. 2017. Continuous probability distribution prediction of image emotions via multi-task shared sparse regression. IEEE Transactions on Multimedia 19, 3 (2017), 632--645.Google Scholar
Digital Library
- Sicheng Zhao, Hongxun Yao, You Yang, and Yanhao Zhang. 2014. Affective image retrieval via multi-graph learning. In Proceedings of the ACM International Conference on Multimedia.Google Scholar
Digital Library
- Sicheng Zhao, Xin Zhao, Guiguang Ding, and Kurt Keutzer. 2018. EmotionGAN: Unsupervised domain adaptation for learning discrete probability distributions of image emotions. In Proceedings of the ACM International Conference on Multimedia. 1319--1327.Google Scholar
Digital Library
- Honglin Zheng, Tianlang Chen, Quanzeng You, and Jiebo Luo. 2017. When saliency meets sentiment: Understanding how image content invokes emotion and sentiment. In Proceedings of the IEEE International Conference on Image Processing.Google Scholar
Cross Ref
- Xinge Zhu, Liang Li, Weigang Zhang, Tianrong Rao, Min Xu, Qingming Huang, and Dong Xu. 2017. Dependency exploitation: A unified CNN-RNN approach for visual emotion recognition. In Proceedings of the International Joint Conference on Artificial Intelligence.Google Scholar
Cross Ref
- Jinfeng Zhuang, Ivor W. Tsang, and Steven C. H. Hoi. 2011. Two-layer multiple kernel learning. Journal of Machine Learning Research 15 (2011), 909--917.Google Scholar
Index Terms
Learning Discriminative Sentiment Representation from Strongly- and Weakly Supervised CNNs
Recommendations
Weakly supervised discriminate enhancement network for visual sentiment analysis
AbstractSeveral methods employ weakly supervised technology to highlight the visual sentiment information in images, so as to improve the performance of sentiment analysis. However, the over-focusing of location technology leads to the neglect of the ...
Multiple kernel-based dictionary learning for weakly supervised classification
In this paper, we develop a multiple instance learning (MIL) algorithm using the dictionary learning framework where the labels are given in the form of positive and negative bags, with each bag containing multiple samples. A positive bag is guaranteed ...
Weakly Supervised Joint Sentiment-Topic Detection from Text
Sentiment analysis or opinion mining aims to use automated tools to detect subjective information such as opinions, attitudes, and feelings expressed in text. This paper proposes a novel probabilistic modeling framework called joint sentiment-topic (JST)...






Comments