Abstract
The wide popularity of digital photography and social networks has generated a rapidly growing volume of multimedia data (i.e., images, music, and videos), resulting in a great demand for managing, retrieving, and understanding these data. Affective computing (AC) of these data can help to understand human behaviors and enable wide applications. In this article, we survey the state-of-the-art AC technologies comprehensively for large-scale heterogeneous multimedia data. We begin this survey by introducing the typical emotion representation models from psychology that are widely employed in AC. We briefly describe the available datasets for evaluating AC algorithms. We then summarize and compare the representative methods on AC of different multimedia types, i.e., images, music, videos, and multimodal data, with the focus on both handcrafted features-based methods and deep learning methods. Finally, we discuss some challenges and future directions for multimedia affective computing.
- Mojtaba Khomami Abadi, Ramanathan Subramanian, Seyed Mostafa Kia, Paolo Avesani, Ioannis Patras, and Nicu Sebe. 2015. DECAF: MEG-based multimodal database for decoding affective physiological responses. IEEE Trans. Affect. Comput. 6, 3 (2015), 209--222.Google Scholar
Digital Library
- Esra Acar, Frank Hopfgartner, and Sahin Albayrak. 2017. A comprehensive study on mid-level representation and ensemble learning for emotional analysis of video material. Multim. Tools Appl. 76, 9 (2017), 11809--11837.Google Scholar
Digital Library
- Xavier Alameda-Pineda, Elisa Ricci, Yan Yan, and Nicu Sebe. 2016. Recognizing emotions from abstract paintings using non-linear matrix completion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5240--5248.Google Scholar
Cross Ref
- Anna Aljanaki. 2016. Emotion in Music: Representation and Computational Modeling. Ph.D. Dissertation. Utrecht University.Google Scholar
- Anna Aljanaki and Mohammad Soleymani. 2018. A data-driven approach to mid-level perceptual musical feature modeling. In Proceedings of the International Society for Music Information Retrieval Conference.Google Scholar
- Anna Aljanaki, Yi-Hsuan Yang, and Mohammad Soleymani. 2017. Developing a benchmark for emotional analysis of music. PloS One 12, 3 (2017), e0173392.Google Scholar
Cross Ref
- Oscar Araque, Lorenzo Gatti, Jacopo Staiano, and Marco Guerini. 2018. DepecheMood++: A bilingual emotion lexicon built through simple yet powerful techniques. Retrieved from: Arxiv Preprint Arxiv:1810.03660 (2018).Google Scholar
- Sutjipto Arifin and Peter Y. K. Cheung. 2008. Affective level video segmentation by utilizing the pleasure-arousal-dominance information. IEEE Trans. Multim. 10, 7 (2008), 1325--1341.Google Scholar
Digital Library
- Tadas Baltrušaitis, Chaitanya Ahuja, and Louis-Philippe Morency. 2019. Multimodal machine learning: A survey and taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 2 (2019), 423--443.Google Scholar
Digital Library
- Tadas Baltrusaitis, Ntombikayise Banda, and Peter Robinson. 2013. Dimensional affect recognition using continuous conditional random fields. In Proceedings of the IEEE International Conference and Workshops on Automatic Face and Gesture Recognition. 1--8.Google Scholar
Cross Ref
- Yoann Baveye, Christel Chamaret, Emmanuel Dellandréa, and Liming Chen. 2018. Affective video content analysis: A multidisciplinary insight. IEEE Trans. Affect. Comput. 9, 4 (2018), 396--409.Google Scholar
Digital Library
- Yoann Baveye, Emmanuel Dellandréa, Christel Chamaret, and Liming Chen. 2015. Deep learning vs. kernel methods: Performance for emotion prediction in videos. In Proceedings of the International Conference on Affective Computing and Intelligent Interaction. 77--83.Google Scholar
Digital Library
- Yoann Baveye, Emmanuel Dellandrea, Christel Chamaret, and Liming Chen. 2015. Liris-accede: A video database for affective content analysis. IEEE Trans. Affect. Comput. 6, 1 (2015), 43--55.Google Scholar
Digital Library
- Olfa Ben-Ahmed and Benoit Huet. 2018. Deep multimodal features for movie genre and interestingness prediction. In Proceedings of the International Conference on Content-Based Multimedia Indexing. 1--6.Google Scholar
Cross Ref
- Sergio Benini, Luca Canini, and Riccardo Leonardi. 2011. A connotative space for supporting movie affective recommendation. IEEE Trans. Multim. 13, 6 (2011), 1356--1370.Google Scholar
Digital Library
- Dmitry Bogdanov, Nicolas Wack, Emilia Gómez, Sankalp Gulati, Perfecto Herrera, O. Mayor, Gerard Roma, Justin Salamon, J. R. Zapata, and Xavier Serra. 2013. ESSENTIA: An audio analysis library for music information retrieval. In Proceedings of the International Society for Music Information Retrieval Conference. 493--498.Google Scholar
- Damian Borth, Tao Chen, Rongrong Ji, and Shih-Fu Chang. 2013. Sentibank: Large-scale ontology and classifiers for detecting sentiment and emotions in visual content. In Proceedings of the ACM International Conference on Multimedia. 459--460.Google Scholar
Digital Library
- Damian Borth, Rongrong Ji, Tao Chen, Thomas Breuel, and Shih-Fu Chang. 2013. Large-scale visual sentiment ontology and detectors using adjective noun pairs. In Proceedings of the ACM International Conference on Multimedia. 223--232.Google Scholar
Digital Library
- Erik Cambria, Thomas Mazzocco, Amir Hussain, and Chris Eckl. 2011. Sentic medoids: Organizing affective common sense knowledge in a multi-dimensional vector space. In Advances in Neural Networks, Vol. 6677. Springer, 601--610.Google Scholar
- Mo Chen, Gong Cheng, and Lei Guo. 2018. Identifying affective levels on music video via completing the missing modality. Multim. Tools Appl. 77, 3 (2018), 3287--3302.Google Scholar
Digital Library
- Tao Chen, Damian Borth, Trevor Darrell, and Shih-Fu Chang. 2014. DeepSentiBank: Visual sentiment concept classification with deep convolutional neural networks. Comput. Sci. (2014).Google Scholar
- Tao Chen, Felix X. Yu, Jiawei Chen, Yin Cui, Yan-Ying Chen, and Shih-Fu Chang. 2014. Object-based visual sentiment concept analysis and application. In Proceedings of the ACM International Conference on Multimedia. 367--376.Google Scholar
Digital Library
- Yu-An Chen, Yi-Hsuan Yang, Ju-Chiang Wang, and Homer Chen. 2015. The AMG1608 dataset for music emotion recognition. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. 693--697.Google Scholar
Cross Ref
- Juan Abdon Miranda Correa, Mojtaba Khomami Abadi, Nicu Sebe, and Ioannis Patras. 2017. AMIGOS: A dataset for mood, personality, and affect research on individuals and GrOupS. Retrieved from: Arxiv Preprint Arxiv:1702.02510 (2017).Google Scholar
- Vaidehi Dalmia, Hongyi Liu, and Shih-Fu Chang. 2016. Columbia MVSO image sentiment dataset. Retrieved from: Arxiv Preprint Arxiv:1611.04455 (2016).Google Scholar
- Elise S. Dan-Glauser and Klaus R. Scherer. 2011. The Geneva affective picture database (GAPED): A new 730-picture database focusing on valence and normative significance. Behav. Res. Meth. 43, 2 (2011), 468--477.Google Scholar
Cross Ref
- Emmanuel Dellandréa, Liming Chen, Yoann Baveye, Mats Viktor Sjöberg, Christel Chamaret et al. 2016. The MediaEval 2016 emotional impact of movies task. In Proceedings of the CEUR Workshop.Google Scholar
- Emmanuel Dellandréa, Martijn Huigsloot, Liming Chen, Yoann Baveye, and Mats Sjöberg. 2017. The MediaEval 2017 emotional impact of movies task. In Proceedings of the MediaEval Workshop.Google Scholar
- Emmanuel Dellandréa, Martijn Huigsloot, Liming Chen, Yoann Baveye, Zhongzhe Xiao, and Mats Sjöberg. 2018. The MediaEval 2018 emotional impact of movies task. In Proceedings of the MediaEval Workshop.Google Scholar
- Emmanuel Dellandréa, Martijn Huigsloot, Liming Chen, Yoann Baveye, Zhongzhe Xiao, and Mats Sjöberg. 2019. Datasets column: Predicting the emotional impact of movies. ACM SIGMultim. Rec. 10, 4 (2019), 6.Google Scholar
Digital Library
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics. 4171--4186.Google Scholar
- X. H. J. S. Downie, Cyril Laurier, and M. B. A. F. Ehmann. 2008. The 2007 MIREX audio mood classification task: Lessons learned. In Proceedings of the International Society for Music Information Retrieval Conference. 462--467.Google Scholar
- Tuomas Eerola and Jonna K. Vuoskoski. 2011. A comparison of the discrete and dimensional models of emotion in music. Psych. Music 39, 1 (2011), 18--49.Google Scholar
Cross Ref
- Paul Ekman. 1992. An argument for basic emotions. Cog. Emot. 6, 3--4 (1992), 169--200.Google Scholar
Cross Ref
- Moataz El Ayadi, Mohamed S. Kamel, and Fakhri Karray. 2011. Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recog. 44, 3 (2011), 572--587.Google Scholar
Digital Library
- Florian Eyben, Felix Weninger, Florian Gross, and Björn Schuller. 2013. Recent developments in openSMILE, the Munich open-source multimedia feature extractor. In Proceedings of the ACM International Conference on Multimedia. 835--838.Google Scholar
Digital Library
- Florian Eyben, Martin Wöllmer, and Björn Schuller. 2010. OpenSMILE: The Munich versatile and fast open-source audio feature extractor. In Proceedings of the ACM International Conference on Multimedia. 1459--1462.Google Scholar
Digital Library
- Mehmet Gönen and Ethem Alpaydın. 2011. Multiple kernel learning algorithms. J. Mach. Learn. Res. 12, July (2011), 2211--2268.Google Scholar
- Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 2672--2680.Google Scholar
Digital Library
- Dongdong Gui, Sheng-hua Zhong, and Zhong Ming. 2018. Implicit affective video tagging using pupillary response. In Proceedings of the International Conference on Multimedia Modeling. 165--176.Google Scholar
Cross Ref
- Jie Guo, Bin Song, Peng Zhang, Mengdi Ma, Wenwen Luo et al. 2019. Affective video content analysis based on multimodal data fusion in heterogeneous networks. Inform. Fus. 51 (2019), 224--232.Google Scholar
Cross Ref
- Rishabh Gupta, Mojtaba Khomami Abadi, Jesús Alejandro Cárdenes Cabré, Fabio Morreale, Tiago H. Falk, and Nicu Sebe. 2016. A quality adaptive multimodal affect recognition system for user-centric multimedia indexing. In Proceedings of the ACM International Conference on Multimedia Retrieval. 317--320.Google Scholar
Digital Library
- Junwei Han, Xiang Ji, Xintao Hu, Lei Guo, and Tianming Liu. 2015. Arousal recognition using audio-visual features and FMRI-based brain response. IEEE Trans. Affect. Comput. 6, 4 (2015), 337--347.Google Scholar
Digital Library
- Junwei Han, Dingwen Zhang, Gong Cheng, Nian Liu, and Dong Xu. 2018. Advanced deep-learning techniques for salient and category-specific object detection: A survey. IEEE Sig. Proc. Mag. 35, 1 (2018), 84--100.Google Scholar
Cross Ref
- Alan Hanjalic. 2006. Extracting moods from pictures and sounds: Towards truly personalized TV. IEEE Sig. Proc. Mag. 23, 2 (2006), 90--100.Google Scholar
Cross Ref
- Alan Hanjalic and Li-Qun Xu. 2005. Affective video content representation and modeling. IEEE Trans. Multim. 7, 1 (2005), 143--154.Google Scholar
Digital Library
- John H. L. Hansen and Taufiq Hasan. 2015. Speaker recognition by machines and humans: A tutorial review. IEEE Sig. Proc. Mag. 32, 6 (2015), 74--99.Google Scholar
Cross Ref
- Samitha Herath, Mehrtash Harandi, and Fatih Porikli. 2017. Going deeper into action recognition: A survey. Image Vis. Comput. 60 (2017), 4--21.Google Scholar
Digital Library
- Weiming Hu, Xinmiao Ding, Bing Li, Jianchao Wang, Yan Gao, Fangshi Wang, and Stephen Maybank. 2016. Multi-perspective cost-sensitive context-aware multi-instance sparse coding and its application to sensitive video recognition. IEEE Trans. Multim. 18, 1 (2016), 76--89.Google Scholar
Digital Library
- Xiao Hu and J. Stephen Downie. 2007. Exploring mood metadata: Relationships with genre, artist, and usage metadata. In Proceedings of the International Society for Music Information Retrieval Conference. 67--72.Google Scholar
- Jian Huang, Ya Li, Jianhua Tao, Zheng Lian, and Jiangyan Yi. 2018. End-to-end continuous emotion recognition from video using 3D ConvLSTM networks. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. 6837--6841.Google Scholar
Cross Ref
- Charles Inskip, Andy Macfarlane, and Pauline Rafferty. 2012. Towards the disintermediation of creative music search: Analysing queries to determine important facets. Int. J. Dig. Lib. 12, 2--3 (2012), 137--147.Google Scholar
- Bernard J. Jansen, Mimi Zhang, Kate Sobel, and Abdur Chowdury. 2009. Twitter power: Tweets as electronic word of mouth. J. Assoc. Inform. Sci. Technol. 60, 11 (2009), 2169--2188.Google Scholar
Digital Library
- Audra E. Massey, Jeffrey H. Kahn, Renée M. Tobin, and Jennifer A. Anderson. 2007. Measuring emotional expression with the linguistic inquiry and word count. JSTOR: Amer. J. Psychol. 120, 2 (2007), 263--286.Google Scholar
- Yu-Gang Jiang, Baohan Xu, and Xiangyang Xue. 2014. Predicting emotions in user-generated videos. In Proceedings of the 28th AAAI Conference on Artificial Intelligence.Google Scholar
- Dhiraj Joshi, Ritendra Datta, Elena Fedorovskaya, Quang-Tuan Luong, James Z. Wang, Jia Li, and Jiebo Luo. 2011. Aesthetics and emotions in images. IEEE Sig. Proc. Mag. 28, 5 (2011), 94--115.Google Scholar
Cross Ref
- Brendan Jou, Tao Chen, Nikolaos Pappas, Miriam Redi, Mercan Topkara, and Shih-Fu Chang. 2015. Visual affect around the world: A large-scale multilingual visual sentiment ontology. In Proceedings of the ACM International Conference on Multimedia. 159--168.Google Scholar
Digital Library
- Brendan Jou, Margaret Yuying Qian, and Shih-Fu Chang. 2016. SentiCart: Cartography and geo-contextualization for multilingual visual sentiment. In Proceedings of the ACM International Conference on Multimedia Retrieval. 389--392.Google Scholar
Digital Library
- Pooya Khorrami, Tom Le Paine, Kevin Brady, Charlie Dagli, and Thomas S. Huang. 2016. How deep neural networks can improve emotion recognition on video data. In Proceedings of the IEEE International Conference on Image Processing. 619--623.Google Scholar
- Youngmoo E. Kim, Erik M. Schmidt, Raymond Migneco, Brandon G. Morton, Patrick Richardson, Jeffrey Scott, Jacquelin A. Speck, and Douglas Turnbull. 2010. Music emotion recognition: A state-of-the-art review. In Proceedings of the International Society for Music Information Retrieval Conference, Vol. 86. 937--952.Google Scholar
- Sander Koelstra, Christian Muhl, Mohammad Soleymani, Jong-Seok Lee, Ashkan Yazdani, Touradj Ebrahimi, Thierry Pun, Anton Nijholt, and Ioannis Patras. 2012. DEAP: A database for emotion analysis using physiological signals. IEEE Trans. Affect. Comput. 3, 1 (2012), 18--31.Google Scholar
Digital Library
- Peter J. Lang, Margaret M. Bradley, and Bruce N. Cuthbert. 1997. International affective picture system (IAPS): Technical manual and affective ratings. NIMH Center Study Emot. Attent. (1997), 39--58.Google Scholar
- Martha Larson, Mohammad Soleymani, Guillaume Gravier, Bogdan Ionescu, and Gareth J. F. Jones. 2017. The benchmarking initiative for multimedia evaluation: MediaEval 2016. IEEE MultiM. 24, 1 (2017), 93--96.Google Scholar
Cross Ref
- Olivier Lartillot, Petri Toiviainen, and Tuomas Eerola. 2008. A Matlab toolbox for music information retrieval. In Data Analysis, Machine Learning and Applications. Springer, 261--268.Google Scholar
- Cyril Laurier, Perfecto Herrera, M. Mandel, and D. Ellis. 2007. Audio music mood classification using support vector machine. In MIREX task on Audio Mood Classification. IEEE, 2--4.Google Scholar
- Joonwhoan Lee and EunJong Park. 2011. Fuzzy similarity-based emotional classification of color images. IEEE Trans. Multim. 13, 5 (2011), 1031--1039.Google Scholar
Digital Library
- Bing Li, Weihua Xiong, Weiming Hu, and Xinmiao Ding. 2012. Context-aware affective images classification based on bilayer sparse representation. In Proceedings of the ACM International Conference on Multimedia. 721--724.Google Scholar
Digital Library
- Benjamin J. Li, Jeremy N. Bailenson, Adam Pines, Walter J. Greenleaf, and Leanne M. Williams. 2017. A public database of immersive VR videos with corresponding ratings of arousal, valence, and correlations between head movements and self report measures. Front. Psych. 8 (2017), 2116.Google Scholar
Cross Ref
- Tsung-Yu Lin, Aruni Roy Chowdhury, and Subhransu Maji. 2015. Bilinear CNN models for fine-grained visual recognition. In Proceedings of the IEEE International Conference on Computer Vision. 1449--1457.Google Scholar
Digital Library
- Xin Lu, Poonam Suryanarayan, Reginald B. Adams Jr., Jia Li, Michelle G. Newman, and James Z. Wang. 2012. On shape and the computability of emotions. In Proceedings of the ACM International Conference on Multimedia. 229--238.Google Scholar
- Jana Machajdik and Allan Hanbury. 2010. Affective image classification using features inspired by psychology and art theory. In Proceedings of the ACM International Conference on Multimedia. 83--92.Google Scholar
Digital Library
- Nikos Malandrakis, Alexandros Potamianos, Georgios Evangelopoulos, and Athanasia Zlatintsi. 2011. A supervised approach to movie emotion tracking. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. 2376--2379.Google Scholar
Cross Ref
- Ricardo Malheiro, Renato Panda, Paulo Gomes, and Rui Pedro Paiva. 2016. Emotionally relevant features for classification and regression of music lyrics. IEEE Trans. Affect. Comput. 9, 2 (2016), 240--254.Google Scholar
Cross Ref
- Daniel McDuff and Mohammad Soleymani. 2017. Large-scale affective content analysis: Combining media content features and facial reactions. In Proceedings of the IEEE International Conference on Automatic Face 8 Gesture Recognition. 339--345.Google Scholar
Cross Ref
- Brian McFee, Colin Raffel, Dawen Liang, Daniel P. W. Ellis, Matt McVicar, Eric Battenberg, and Oriol Nieto. 2015. librosa: Audio and music signal analysis in Python. In Proceedings of the Python in Science Conferences, Vol. 8. 18--25.Google Scholar
Cross Ref
- Michele Merler, Khoi-Nguyen C. Mac, Dhiraj Joshi, Quoc-Bao Nguyen, Stephen Hammer, John Kent, Jinjun Xiong, Minh N. Do, John R. Smith, and Rogerio S. Feris. 2018. Automatic curation of sports highlights using multimodal excitement features. IEEE Trans. Multim. 21, 5 (2018), 1147--1160.Google Scholar
Digital Library
- Joseph A. Mikels, Barbara L. Fredrickson, Gregory R. Larkin, Casey M. Lindberg, Sam J. Maglio, and Patricia A. Reuter-Lorenz. 2005. Emotional category data on images from the International Affective Picture System. Behav. Res. Meth. 37, 4 (2005), 626--630.Google Scholar
Cross Ref
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 3111--3119.Google Scholar
- Shasha Mo, Jianwei Niu, Yiming Su, and Sajal K. Das. 2018. A novel feature set for video emotion recognition. Neurocomputing 291 (2018), 11--20.Google Scholar
Cross Ref
- Fabio Morreale, Raul Masu, Antonella De Angeli et al. 2013. Robin: An algorithmic composer for interactive scenarios. In Proceedings of the Sound and Music Computing Conference. 207--212.Google Scholar
- Michal Muszynski, Leimin Tian, Catherine Lai, Johanna Moore, Theodoros Kostoulas, Patrizia Lombardo, Thierry Pun, and Guillaume Chanel. 2019. Recognizing induced emotions of movie audiences from multimodal information. IEEE Trans. Affect. Comput. (2019).Google Scholar
- Shahla Nemati and Ahmad Reza Naghsh-Nilchi. 2016. Incorporating social media comments in affective video retrieval. J. Inform. Sci. 42, 4 (2016), 524--538.Google Scholar
Digital Library
- Shahla Nemati and Ahmad Reza Naghsh-Nilchi. 2017. An evidential data fusion method for affective music video retrieval. Intell. Data Anal. 21, 2 (2017), 427--441.Google Scholar
Cross Ref
- Mihalis A. Nicolaou, Hatice Gunes, and Maja Pantic. 2011. Continuous prediction of spontaneous affect from multiple cues and modalities in valence-arousal space. IEEE Trans. Affect. Comput. 2, 2 (2011), 92--105.Google Scholar
Digital Library
- Jianwei Niu, Yiming Su, Shasha Mo, and Zeyu Zhu. 2017. A novel affective visualization system for videos based on acoustic and visual features. In Proceedings of the International Conference on Multimedia Modeling. 15--27.Google Scholar
Cross Ref
- Jianwei Niu, Shihao Wang, Yiming Su, and Song Guo. 2017. Temporal factor-aware video affective analysis and recommendation for cyber-based social media. IEEE Tran. Emerg. Topics Comput. 5 (2017), 412--424.Google Scholar
Cross Ref
- Jianwei Niu, Xiaoke Zhao, and Muhammad Ali Abdul Aziz. 2016. A novel affect-based model of similarity measure of videos. Neurocomputing 173 (2016), 339--345.Google Scholar
Digital Library
- Andrew Ortony, Gerald L. Clore, and Allan Collins. 1988. The Cognitive Structure of Emotions. Cambridge University Press.Google Scholar
- Renato Eduardo Silva Panda. 2019. Emotion-based Analysis and Classification of Audio Music. Ph.D. Dissertation. Univesity of Coimbra, Coimbra, Portugal.Google Scholar
- Yagya Raj Pandeya and Lee Joonwhoan. 2019. Music-video emotion analysis using late fusion of multimodal. DEStech Trans. Comput. Sci. Eng. (2019).Google Scholar
- Bo Pang, Lillian Lee et al. 2008. Opinion mining and sentiment analysis. Found. Trends® Inform. Retr. 2, 1--2 (2008), 1--135.Google Scholar
Digital Library
- Lei Pang and Chong-Wah Ngo. 2015. Mutlimodal learning with deep Boltzmann machine for emotion prediction in user generated videos. In Proceedings of the ACM International Conference on Multimedia Retrieval. 619--622.Google Scholar
Digital Library
- Lei Pang, Shiai Zhu, and Chong Wah Ngo. 2015. Deep multimodal learning for affective analysis and retrieval. IEEE Trans. Multim. 17, 11 (2015), 2008--2020.Google Scholar
Digital Library
- Genevieve Patterson and James Hays. 2012. Sun attribute database: Discovering, annotating, and recognizing scene attributes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2751--2758.Google Scholar
Cross Ref
- Kuan-Chuan Peng, Amir Sadovnik, Andrew Gallagher, and Tsuhan Chen. 2015. A mixed bag of emotions: Model, predict, and transfer emotion distributions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 860--868.Google Scholar
Cross Ref
- Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1532--1543.Google Scholar
Cross Ref
- Robert Plutchik. 1980. Emotion: A Psychoevolutionary Synthesis. Harpercollins College Division.Google Scholar
- Soujanya Poria, Devamanyu Hazarika, Navonil Majumder, Gautam Naik, Erik Cambria, and Rada Mihalcea. 2019. MELD: A multimodal multi-party dataset for emotion recognition in conversations. In Proceedings of the Meeting of the Association for Computational Linguistics. 527--536.Google Scholar
Cross Ref
- Shyam Sundar Rajagopalan, Louis-Philippe Morency, Tadas Baltrušaitis, and Roland Goecke. 2016. Extending long short-term memory for multi-view structured learning. In Proceedings of the European Conference on Computer Vision. 338--353.Google Scholar
Cross Ref
- Tianrong Rao, Min Xu, Huiying Liu, Jinqiao Wang, and Ian Burnett. 2016. Multi-scale blocks based image emotion classification using multiple instance learning. In Proceedings of the IEEE International Conference on Image Processing. 634--638.Google Scholar
Cross Ref
- Tianrong Rao, Min Xu, and Dong Xu. 2016. Learning multi-level deep representations for image emotion classification. Retrieved from: Arxiv Preprint Arxiv:1611.07145 (2016).Google Scholar
- David Sander, Didier Grandjean, and Klaus R. Scherer. 2005. A systems approach to appraisal mechanisms in emotion. Neural Netw. 18, 4 (2005), 317--352.Google Scholar
Digital Library
- Andreza Sartori, Dubravko Culibrk, Yan Yan, and Nicu Sebe. 2015. Who’s afraid of itten: Using the art theory of color combination to analyze emotions in abstract paintings. In Proceedings of the ACM International Conference on Multimedia. 311--320.Google Scholar
Digital Library
- Harold Schlosberg. 1954. Three dimensions of emotion. Psych. Rev. 61, 2 (1954), 81.Google Scholar
Cross Ref
- Dongyu She, Jufeng Yang, Ming-Ming Cheng, Yu-Kun Lai, Paul L. Rosin, and Liang Wang. 2019. WSCNet: Weakly supervised coupled networks for visual sentiment classification and detection. IEEE Trans. Multim. (2019).Google Scholar
- Guangyao Shen, Jia Jia, Liqiang Nie, Fuli Feng, Cunjun Zhang, Tianrui Hu, Tat-Seng Chua, and Wenwu Zhu. 2017. Depression detection via harvesting social media: A multimodal dictionary learning solution. In Proceedings of the International Joint Conference on Artificial Intelligence. 3838--3844.Google Scholar
Cross Ref
- Abhinav Shukla. 2018. Multimodal Emotion Recognition from Advertisements with Application to Computational Advertising. Ph.D. Dissertation. International Institute of Information Technology Hyderabad.Google Scholar
- Abhinav Shukla, Shruti Shriya Gullapuram, Harish Katti, Karthik Yadati, Mohan Kankanhalli, and Ramanathan Subramanian. 2017. Affect recognition in ads with application to computational advertising. In Proceedings of the ACM International Conference on Multimedia. 1148--1156.Google Scholar
Digital Library
- Abhinav Shukla, Shruti Shriya Gullapuram, Harish Katti, Karthik Yadati, Mohan Kankanhalli, and Ramanathan Subramanian. 2017. Evaluating content-centric vs. user-centric ad affect recognition. In Proceedings of the ACM International Conference on Multimodal Interaction. 402--410.Google Scholar
Digital Library
- Abhinav Shukla, Harish Katti, Mohan Kankanhalli, and Ramanathan Subramanian. 2018. Looking beyond a clever narrative: Visual context and attention are primary drivers of affect in video advertisements. In Proceedings of the ACM International Conference on Multimodal Interaction. 210--219.Google Scholar
Digital Library
- Sarath Sivaprasad, Tanmayee Joshi, Rishabh Agrawal, and Niranjan Pedanekar. 2018. Multimodal continuous prediction of emotions in movies using long short-term memory networks. In Proceedings of the ACM International Conference on Multimedia Retrieval. 413--419.Google Scholar
Digital Library
- Mats Sjöberg, Yoann Baveye, Hanli Wang, Vu Lam Quang, Bogdan Ionescu, Emmanuel Dellandréa, Markus Schedl, Claire-Hélène Demarty, and Liming Chen. 2015. The MediaEval 2015 affective impact of movies task. In Proceedings of the MediaEval Conference.Google Scholar
- John R. Smith, Dhiraj Joshi, Benoit Huet, Winston Hsu, and Jozef Cota. 2017. Harnessing A.I. for augmenting creativity: Application to movie trailer creation. In Proceedings of the ACM International Conference on Multimedia. 1799--1808.Google Scholar
Digital Library
- Mohammad Soleymani. 2015. The quest for visual interest. In Proceedings of the ACM International Conference on Multimedia. 919--922.Google Scholar
Digital Library
- Mohammad Soleymani, Micheal N. Caro, Erik M. Schmidt, Cheng-Ya Sha, and Yi-Hsuan Yang. 2013. 1000 Songs for emotional analysis of music. In Proceedings of the ACM International Workshop on Crowdsourcing for Multimedia. 1--6.Google Scholar
Digital Library
- Mohammad Soleymani, David Garcia, Brendan Jou, Björn Schuller, Shih-Fu Chang, and Maja Pantic. 2017. A survey of multimodal sentiment analysis. Image Vis. Comput. 65 (2017), 3--14.Google Scholar
Cross Ref
- Mohammad Soleymani, Joep J. M. Kierkels, Guillaume Chanel, and Thierry Pun. 2009. A Bayesian framework for video affective representation. In Proceedings of the International Conference on Affective Computing and Intelligent Interaction and Workshops. 1--7.Google Scholar
Cross Ref
- Mohammad Soleymani, Martha Larson, Thierry Pun, and Alan Hanjalic. 2014. Corpus development for affective video indexing. IEEE Trans. Multim. 16, 4 (2014), 1075--1089.Google Scholar
Digital Library
- Mohammad Soleymani, Jeroen Lichtenauer, Thierry Pun, and Maja Pantic. 2012. A multimodal database for affect recognition and implicit tagging. IEEE Trans. Affect. Comput. 3, 1 (2012), 42--55.Google Scholar
Digital Library
- Yale Song and Mohammad Soleymani. 2019. Polysemous visual-semantic embedding for cross-modal retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1979--1988.Google Scholar
Cross Ref
- Jacquelin A. Speck, Erik M. Schmidt, Brandon G. Morton, and Youngmoo E. Kim. 2011. A comparative study of collaborative vs. traditional musical mood annotation. In Proceedings of the International Society for Music Information Retrieval Conference, Vol. 104. 549--554.Google Scholar
- Jacopo Staiano and Marco Guerini. 2014. Depeche mood: A lexicon for emotion analysis from crowd annotated news. In Proceedings of the Meeting of the Association for Computational Linguistics. 427--433.Google Scholar
Cross Ref
- Ramanathan Subramanian, Julia Wache, Mojtaba Khomami Abadi, Radu L. Vieriu, Stefan Winkler, and Nicu Sebe. 2018. ASCERTAIN: Emotion and personality recognition using commercial sensors. IEEE Trans. Affect. Comput. 9, 2 (2018), 147--160.Google Scholar
Cross Ref
- Kai Sun, Junqing Yu, Yue Huang, and Xiaoqiang Hu. 2009. An improved valence-arousal emotion space for video affective content representation and recognition. In Proceedings of the IEEE International Conference on Multimedia and Expo. 566--569.Google Scholar
- Jussi Tarvainen, Jorma Laaksonen, and Tapio Takala. 2018. Film mood and its quantitative determinants in different types of scenes. IEEE Trans. Affect. Comput. (2018).Google Scholar
- René Marcelino Abritta Teixeira, Toshihiko Yamasaki, and Kiyoharu Aizawa. 2012. Determination of emotional content of video clips by low-level audiovisual features. Multim. Tools Appl. 61, 1 (2012), 21--49.Google Scholar
Cross Ref
- Antonio Torralba and Alexei A. Efros. 2011. Unbiased look at dataset bias. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1521--1528.Google Scholar
- Douglas Turnbull, Luke Barrington, David Torres, and Gert Lanckriet. 2007. Towards musical query-by-semantic-description using the cal500 data set. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval. 439--446.Google Scholar
Digital Library
- George Tzanetakis and Perry Cook. 2000. Marsyas: A framework for audio analysis. Org. Sound 4, 3 (2000), 169--175.Google Scholar
Digital Library
- Shangfei Wang, Shiyu Chen, and Qiang Ji. 2019. Content-based video emotion tagging augmented by users’ multiple physiological responses. IEEE Trans. Affect. Comput. 10, 2 (2019), 155--166.Google Scholar
Cross Ref
- Shangfei Wang and Qiang Ji. 2015. Video affective content analysis: A survey of state-of-the-art methods. IEEE Trans. Affect. Comput. 6, 4 (2015), 410--430.Google Scholar
Digital Library
- Shangfei Wang, Yachen Zhu, Lihua Yue, and Qiang Ji. 2015. Emotion recognition with the help of privileged information. IEEE Trans. Auton. Ment. Dev. 7, 3 (2015), 189--200.Google Scholar
Digital Library
- Xiaohui Wang, Jia Jia, Jiaming Yin, and Lianhong Cai. 2013. Interpretable aesthetic features for affective image classification. In Proceedings of the IEEE International Conference on Image Processing. 3230--3234.Google Scholar
Cross Ref
- Amy Beth Warriner, Victor Kuperman, and Marc Brysbaert. 2013. Norms of valence, arousal, and dominance for 13,915 English lemmas. Behav. Res. Meth. 45, 4 (2013), 1191--1207.Google Scholar
Cross Ref
- Baixi Xing, Hui Zhang, Kejun Zhang, Lekai Zhang, Xinda Wu, Xiaoying Shi, Shanghai Yu, and Sanyuan Zhang. 2019. Exploiting EEG signals and audiovisual feature fusion for video emotion recognition. IEEE Access 7 (2019), 59844--59861.Google Scholar
Cross Ref
- Baohan Xu, Yanwei Fu, Yu-Gang Jiang, Boyang Li, and Leonid Sigal. 2016. Heterogeneous knowledge transfer in video emotion recognition, attribution, and summarization. IEEE Trans. Affect. Comput. 9, 2 (2016), 255--270.Google Scholar
Cross Ref
- Can Xu, Suleyman Cetintas, Kuang-Chih Lee, and Li-Jia Li. 2014. Visual sentiment prediction with deep convolutional neural networks. Retrieved from: Arxiv Preprint Arxiv:1411.5731 (2014).Google Scholar
- Jufeng Yang, Dongyu She, Yukun Lai, and Ming-Hsuan Yang. 2018. Retrieving and classifying affective Images via deep metric learning. In Proceedings of the AAAI Conference on Artificial Intelligence.Google Scholar
- Jufeng Yang, Dongyu She, Yu-Kun Lai, Paul L. Rosin, and Ming-Hsuan Yang. 2018. Weakly supervised coupled networks for visual sentiment analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7584--7592.Google Scholar
Cross Ref
- Jufeng Yang, Dongyu She, and Ming Sun. 2017. Joint image emotion classification and distribution learning via deep convolutional neural network. In Proceedings of the International Joint Conference on Artificial Intelligence. 3266--3272.Google Scholar
Cross Ref
- Jufeng Yang, Ming Sun, and Xiaoxiao Sun. 2017. Learning visual sentiment distributions via augmented conditional probability neural network. In Proceedings of the AAAI Conference on Artificial Intelligence. 224--230.Google Scholar
- Peng Yang, Qingshan Liu, and Dimitris N. Metaxas. 2010. Exploring facial expressions with compositional features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2638--2644.Google Scholar
- Xinyu Yang, Yizhuo Dong, and Juan Li. 2018. Review of data features-based music emotion recognition methods. Multim. Syst. 24, 4 (2018), 365--389.Google Scholar
Digital Library
- Yang Yang, Jia Jia, Shumei Zhang, Boya Wu, Qicong Chen, Juanzi Li, Chunxiao Xing, and Jie Tang. 2014. How do your friends on social media disclose your emotions? In Proceedings of the AAAI Conference on Artificial Intelligence. 306--312.Google Scholar
- Yi-Hsuan Yang and Homer H. Chen. 2012. Machine recognition of music emotion: A review. ACM Trans. Intell. Syst. Technol. 3, 3 (2012), 40.Google Scholar
Digital Library
- Xingxu Yao, Dongyu She, Sicheng Zhao, Jie Liang, Yu-Kun Lai, and Jufeng Yang. 2019. Attention-aware polarity sensitive embedding for affective image retrieval. In Proceedings of the IEEE International Conference on Computer Vision.Google Scholar
Cross Ref
- Yun Yi and Hanli Wang. 2018. Multi-modal learning for affective content analysis in movies. Multim. Tools Appl. (2018), 1--20.Google Scholar
- Quanzeng You, Jiebo Luo, Hailin Jin, and Jianchao Yang. 2015. Robust image sentiment analysis using progressively trained and domain transferred deep networks. In Proceedings of the AAAI Conference on Artificial Intelligence. 381--388.Google Scholar
- Quanzeng You, Jiebo Luo, Hailin Jin, and Jianchao Yang. 2016. Building a large scale dataset for image emotion recognition: The fine print and the benchmark. In Proceedings of the AAAI Conference on Artificial Intelligence. 308--314.Google Scholar
- Jianbo Yuan, Sean Mcdonough, Quanzeng You, and Jiebo Luo. 2013. Sentribute: Image sentiment analysis from a mid-level perspective. In Proceedings of the ACM International Workshop on Issues of Sentiment Discovery and Opinion Mining. 10.Google Scholar
Digital Library
- Amir Zadeh, Minghai Chen, Soujanya Poria, Erik Cambria, and Louis-Philippe Morency. 2017. Tensor fusion network for multimodal sentiment analysis. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1103--1114.Google Scholar
Cross Ref
- Amir Zadeh, Paul Pu Liang, Soujanya Poria, Prateek Vij, Erik Cambria, and Louis-Philippe Morency. 2018. Multi-attention recurrent network for human communication comprehension. In Proceedings of the AAAI Conference on Artificial Intelligence.Google Scholar
- Marcel Zentner, Didier Grandjean, and Klaus R. Scherer. 2008. Emotions evoked by the sound of music: Characterization, classification, and measurement.Emotion 8, 4 (2008), 494--521.Google Scholar
- Chi Zhan, Dongyu She, Sicheng Zhao, Ming-Ming Cheng, and Jufeng Yang. 2019. Zero-shot emotion recognition via affective structural embedding. In Proceedings of the IEEE International Conference on Computer Vision.Google Scholar
Cross Ref
- Kejun Zhang, Hui Zhang, Simeng Li, Changyuan Yang, and Lingyun Sun. 2018. The PMEmo dataset for music emotion recognition. In Proceedings of the ACM International Conference on Multimedia Retrieval. 135--142.Google Scholar
Digital Library
- Yanhao Zhang, Lei Qin, Rongrong Ji, Sicheng Zhao, Qingming Huang, and Jiebo Luo. 2016. Exploring coherent motion patterns via structured trajectory learning for crowd mood modeling. IEEE Trans. Circ. Syst. Vid. Technol. 27, 3 (2016), 635--648.Google Scholar
Digital Library
- Sicheng Zhao, Guiguang Ding, Yue Gao, and Jungong Han. 2017. Approximating discrete probability distribution of image emotions by multi-modal features fusion. In Proceedings of the International Joint Conference on Artificial Intelligence. 4669--4675.Google Scholar
Cross Ref
- Sicheng Zhao, Guiguang Ding, Yue Gao, and Jungong Han. 2017. Learning visual emotion distributions via multi-modal features fusion. In Proceedings of the ACM International Conference on Multimedia. 369--377.Google Scholar
Digital Library
- Sicheng Zhao, Guiguang Ding, Yue Gao, Xin Zhao, Youbao Tang, Jungong Han, Hongxun Yao, and Qingming Huang. 2018. Discrete probability distribution prediction of image emotions with shared sparse learning. IEEE Trans. Affect. Comput. (2018).Google Scholar
- Sicheng Zhao, Guiguang Ding, Jungong Han, and Yue Gao. 2018. Personality-aware personalized emotion recognition from physiological signals. In Proceedings of the International Joint Conference on Artificial Intelligence.Google Scholar
Cross Ref
- Sicheng Zhao, Guiguang Ding, Qingming Huang, Tat-Seng Chua, Björn W. Schuller, and Kurt Keutzer. 2018. Affective image content analysis: A comprehensive survey. In Proceedings of the International Joint Conference on Artificial Intelligence. 5534--5541.Google Scholar
Cross Ref
- Sicheng Zhao, Yue Gao, Xiaolei Jiang, Hongxun Yao, Tat-Seng Chua, and Xiaoshuai Sun. 2014. Exploring principles-of-art features for image emotion recognition. In Proceedings of the ACM International Conference on Multimedia. 47--56.Google Scholar
Digital Library
- Sicheng Zhao, Amir Gholaminejad, Guiguang Ding, Yue Gao, Jungong Han, and Kurt Keutzer. 2019. Personalized emotion recognition by personality-aware high-order learning of physiological signals. ACM Trans. Multim. Comput., Commun. Appl. 15, 1s (2019), 14.Google Scholar
Digital Library
- Sicheng Zhao, Zizhou Jia, Hui Chen, Leida Li, Guiguang Ding, and Kurt Keutzer. 2019. PDANet: Polarity-consistent deep attention network for fine-grained visual emotion regression. In Proceedings of the ACM International Conference on Multimedia.Google Scholar
Digital Library
- Sicheng Zhao, Chuang Lin, Pengfei Xu, Sendong Zhao, Yuchen Guo, Ravi Krishna, Guiguang Ding, and Kurt Keutzer. 2019. CycleEmotionGAN: Emotional semantic consistency preserved cycleGAN for adapting image emotions. In Proceedings of the AAAI Conference on Artificial Intelligence. 2620--2627.Google Scholar
Cross Ref
- Sicheng Zhao, Hongxun Yao, Yue Gao, Guiguang Ding, and Tat-Seng Chua. 2018. Predicting personalized image emotion perceptions in social networks. IEEE Trans. Affect. Comput. 9, 4 (2018), 526--540.Google Scholar
Digital Library
- Sicheng Zhao, Hongxun Yao, Yue Gao, Rongrong Ji, and Guiguang Ding. 2017. Continuous probability distribution prediction of image emotions via multi-task shared sparse regression. IEEE Trans. Multim. 19, 3 (2017), 632--645.Google Scholar
Digital Library
- Sicheng Zhao, Hongxun Yao, Yue Gao, Rongrong Ji, Wenlong Xie, Xiaolei Jiang, and Tat-Seng Chua. 2016. Predicting personalized emotion perceptions of social images. In Proceedings of the ACM International Conference on Multimedia. 1385--1394.Google Scholar
Digital Library
- Sicheng Zhao, Hongxun Yao, Xiaolei Jiang, and Xiaoshuai Sun. 2015. Predicting discrete probability distribution of image emotions. In Proceedings of the IEEE International Conference on Image Processing. 2459--2463.Google Scholar
Digital Library
- Sicheng Zhao, Hongxun Yao, Xiaoshuai Sun, Xiaolei Jiang, and Pengfei Xu. 2013. Flexible presentation of videos based on affective content analysis. In Proceedings of the International Conference on Multimedia Modeling. 368--379.Google Scholar
Cross Ref
- Sicheng Zhao, Hongxun Yao, You Yang, and Yanhao Zhang. 2014. Affective image retrieval via multi-graph learning. In Proceedings of the ACM International Conference on Multimedia. 1025--1028.Google Scholar
Digital Library
- Sicheng Zhao, Xin Zhao, Guiguang Ding, and Kurt Keutzer. 2018. EmotionGAN: Unsupervised domain adaptation for learning discrete probability distributions of image emotions. In Proceedings of the ACM International Conference on Multimedia. 1319--1327.Google Scholar
Digital Library
- Sheng-hua Zhong, Jiaxin Wu, and Jianmin Jiang. 2019. Video summarization via spatio-temporal deep architecture. Neurocomputing 332 (2019), 224--235.Google Scholar
Cross Ref
- Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision. 2223--2232.Google Scholar
- Xinge Zhu, Liang Li, Weigang Zhang, Tianrong Rao, Min Xu, Qingming Huang, and Dong Xu. 2017. Dependency exploitation: A unified CNN-RNN approach for visual emotion recognition. In Proceedings of the International Joint Conference on Artificial Intelligence. 3595--3601.Google Scholar
Cross Ref
- Yingying Zhu, Zhengbo Jiang, Jianfeng Peng, and Sheng-hua Zhong. 2016. Video affective content analysis based on protagonist via convolutional neural network. In Proceedings of the Pacific Rim Conference on Multimedia. 170--180.Google Scholar
Digital Library
- Yingying Zhu, Min Tong, Zhengbo Jiang, Shenghua Zhong, and Qi Tian. 2019. Hybrid feature-based analysis of video’s affective content using protagonist detection. Exp. Syst. Appl. 128 (2019), 316--326.Google Scholar
Cross Ref
- Athanasia Zlatintsi, Petros Koutras, Georgios Evangelopoulos, Nikolaos Malandrakis, Niki Efthymiou, Katerina Pastra, Alexandros Potamianos, and Petros Maragos. 2017. COGNIMUSE: A multimodal video database annotated with saliency, events, semantics and emotion with application to summarization. EURASIP J. Image Vid. Proc. 1 (2017), 54.Google Scholar
Cross Ref
Index Terms
Affective Computing for Large-scale Heterogeneous Multimedia Data: A Survey
Recommendations
Affective computing vs. affective placebo
Relaxation training is an application of affective computing with important implications for health and wellness. After detecting user s affective state through physiological sensors, a relaxation training application can provide the user with explicit ...
The affective connection: how and when users communicate emotion
CHI EA '04: CHI '04 Extended Abstracts on Human Factors in Computing SystemsAffective computer systems which recognize human emotions or use emotion in their displays have potential to enhance human-computer interaction (HCI). Wizard-of-Oz (WOZ) methods and experimental design have enabled recording, analysis and comparison of ...
Affective guidance of intelligent agents: How emotion controls cognition
How do emotions and moods color cognition? In this article, we examine how such reactions influence both judgments and cognitive performance. We argue that many affective influences are due, not to affective reactions themselves, but to the information ...






Comments