Abstract
Video summaries present the user with a condensed and succinct representation of the content of a video stream. Usually this is achieved by attaching degrees of importance to low-level image, audio and text features. However, video content elicits strong and measurable physiological responses in the user, which are potentially rich indicators of what video content is memorable to or emotionally engaging for an individual user. This article proposes a technique that exploits such physiological responses to a given video stream by a given user to produce Entertainment-Led VIdeo Summaries (ELVIS). ELVIS is made up of five analysis phases which correspond to the analyses of five physiological response measures: electro-dermal response (EDR), heart rate (HR), blood volume pulse (BVP), respiration rate (RR), and respiration amplitude (RA). Through these analyses, the temporal locations of the most entertaining video subsegments, as they occur within the video stream as a whole, are automatically identified. The effectiveness of the ELVIS technique is verified through a statistical analysis of data collected during a set of user trials. Our results show that ELVIS is more consistent than RANDOM, EDR, HR, BVP, RR and RA selections in identifying the most entertaining video subsegments for content in the comedy, horror/comedy, and horror genres. Subjective user reports also reveal that ELVIS video summaries are comparatively easy to understand, enjoyable, and informative.
- Agius, H., Crockford, C., and Money, A. G. 2008. Emotion and multimedia content. In Encyclopedia of Multimedia 2nd Ed. B. Furht, Ed. Springer, New York, 204--205.Google Scholar
- Aizawa, K., Tancharoen, D., Kawasaki, S., and Yamasaki, T. 2004. Efficient retrieval of life log based on context and content. In Proceedings of the 1st ACM Workshop on Continuous Archival and Retrieval of Personal Experiences (CARPE'04). ACM Press, 22--31. Google Scholar
Digital Library
- Allanson, J. and Fairclough, S. H. 2004. A research agenda for physiological computing. Interact. Comput. 16, 857--878.Google Scholar
Cross Ref
- Amenabar, A. 2001. The Others. Miramax.Google Scholar
- Athanasiadis, T., Mylonas, P., Avrithis, Y., and Kollias, S. 2007. Semantic image segmentation and object labeling. IEEE Trans. Circ. Syst. Video Techn. 17, 298--312. Google Scholar
Digital Library
- Babaguchi, N., Kawai, Y., and Kitahashi, T. 2001. Generation of personalized abstract of sports video. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME'01). IEEE, 800--803.Google Scholar
- Babaguchi, N., Kawai, Y., Ogura, T., and Kitahashi, T. 2004. Personalized abstraction of broadcasted American football video by highlight selection. IEEE Trans. Multimedia 6, 575--586. Google Scholar
Digital Library
- Bailer, W., Lee, F., and Thallinger, G. 2007. Skimming rushes video using retake detection. In Proceedings of the IEEE International Workshop on TRECVID Video Summarization. IEEE, 60--64. Google Scholar
Digital Library
- Barbieri, M., Agnihotri, L., and Dimitrova, N. 2003. Video summarization: Methods and landscape. In Internet Multimedia Management Systems IV. J. R. Smith, S. Panchanathan, and T. Zhang, Eds. SPIE, 1--13.Google Scholar
- Brown, W. A., Corriveau, D. P., and Monti, P. M. 1977. Anger arousal by a motion picture: A methodological note. Amer. J. Psyc. 134, 930--931.Google Scholar
Cross Ref
- Cacioppo, J. T., Berntson, G. G., Klein, D. J., and Poehlmann, K. M. 1997. The psychophysiology of emotion across the lifespan. Ann. Rev. Gerontolo. Geriat. 17, 27--74.Google Scholar
- Cacioppo, J. T., Tassinary, L. G., and Berntson, G. G. 2007. Handbook of Psychphysiogy 3rd Ed. Cambridge University Press.Google Scholar
- Carlson, N. R. 2001. Psychology of Behaviour 7th Ed. Allyn and Bacon.Google Scholar
- Chen, F., Cooper, M., and Adcock, J. 2007. Video summarization preserving dynamic content. In Proceedings of the IEEE International Workshop on TRECVID Video Summarization. IEEE, 40--44. Google Scholar
Digital Library
- Christie, I. C. and Friedman, B. H. 2004. Autonomic specificity of discrete emotion and dimensions of affective space: A multivariate approach. Int. J. Psychophy. 51, 143--153.Google Scholar
Cross Ref
- Clark-Carter, D. 1997. Doing Quantitative Psychological Research: From Design to Report. Psychology Press, London.Google Scholar
- Damnjanovic, U., Piatrik, T., Djordjevic, D., and Izquierdo, E. 2007. Video summarisation for surveillance and news domian. In Proceedings of the the 2nd International Conference on Semantic and Digital Media Technologies. Springer-Verlag, 99--102. Google Scholar
Digital Library
- Davidson, R. J. 1995. Cerebral asymmetry, emotion, and affective style. In Brain Asymmetry, R. J. Davidson and K. Hugdahl Eds. MIT Press, Cambridge, MA, 361--387.Google Scholar
- de Silva, G., Yamasaki, T., and Aizawa, K. 2005. Evaluation of video summarization for a large number of cameras in ubiquitous home. In Proceedings of the 13th ACM International Conference on Multimedia. ACM Press, 820--828. Google Scholar
Digital Library
- de Wied, M., Hoffman, K., and Roskos-Ewoldsen, D. R. 1997. Forewarning of graphic portrayal of violence and the experience of suspenseful drama. Cogni. Emot. 11, 481--494.Google Scholar
Cross Ref
- Detenber, B. H., Simons, R. F., and Bennett, G. 1998. Roll 'em!: The effects of picture motion on emotional responses. J. Broadcast. Electro. Media 42, 113--127.Google Scholar
Cross Ref
- Detyniecki, M. and Marsala, C. 2007. Video rushes summarization by adaptive acceleration and stacking of shots. In Proceedings of the IEEE International Workshop on TRECVID Video Summarization. IEEE, 65--69. Google Scholar
Digital Library
- Ekman, P., Levenson, R. W., and Friesen, W. V. 1983. Autonomic nervous system activity distinguished between emotion. Science 221, 1208--1210.Google Scholar
Cross Ref
- Frazier, T. W., Strauss, M. E., and Steinhauer, S. R. 2004. Respiratory sinus arrhythmia as an index of emotional response in young adults. Psychophys. 41, 75--83.Google Scholar
Cross Ref
- Fridja, N. 1986. The Emotions. Cambridge University Press, Cambridge.Google Scholar
- Furini, M. and Ghini, V. 2006. An audio-video summarisation scheme based on audio and video analysis. In Proceedings of the IEEE Consumer Communications and Networking Conference (CCNC'06). IEEE, 1209--1213.Google Scholar
- Gleitman, H., Reisberg, D., and Gross, A. 2007. Psychology 7th Ed. W. W. Norton, New York.Google Scholar
- Gomez, P. and Danuser, B. 2004. Affective and physiological responses to environmental noises and music. Int. J. Psychophys. 53, 93--103.Google Scholar
Cross Ref
- Gomez, P., Stahel, W., and Danuser, B. 2004. Respiratory responses during affective picture viewing. Biological Psych. 67, 359--373.Google Scholar
Cross Ref
- Greenwald, M. K., Cook, E. W., and Lang, P. J. 1989. Affective judgement and psychophysiological response: Dimensional covariation in the evaluation of pictorial stimuli. J. Pyschophys. 3, 51--64.Google Scholar
- Gross, J. J. and Levenson, R. W. 1995. Emotion elicitation using films. Cogn. Emot. 9, 87--108.Google Scholar
Cross Ref
- Hanjalic, A. 2003. Generic approach to highlight extraction in a sport video. In Proceedings of the IEEE International Conference on Image Processing (ICIP'03). IEEE, 1--4.Google Scholar
Cross Ref
- Hanjalic, A. 2005. Adaptive extraction of highlights from a sport video based on excitement modeling. IEEE Trans. Multimedia 7, 1114--1122. Google Scholar
Digital Library
- Healey, J. A. 2000. Wearable and automotive systems for affect recognition from physiology. PhD Thesis. Department of Electrical Engineering and Computer Science, MIT, Cambridge, MA. Google Scholar
Digital Library
- Jaimes, A., Echigo, T., Teraguchi, M. and Satoh, F. 2002. Learning personalized video highlights from detailed MPEG-7 metadata. In Proceedings of the IEEE International Conference on Image Processing (ICIP'02). IEEE, 133--136.Google Scholar
- Jung, B., Song, J., and Lee, Y. 2007. A narrative-based abstraction framework for story-oriented video. ACM Trans. Multimedia Comput. Commun. Appl. 3, 1--28. Google Scholar
Digital Library
- Kawai, Y., Sumiyoshi, H., and Yagi, N. 2007. Automated production of TV program trailer using electronic program guide. In Proceedings of the ACM International Conference on Image and Video Retrieval (CIVR'07). ACM Press, 49--56. Google Scholar
Digital Library
- Kim, J. and Andre, E. 2008. Emotion-specific dichotomous classification and feature-level fusion of multichannel biosignals for automatic emotion recognition. In Proceedings of the IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems. IEEE, 114--118.Google Scholar
- Kramer, A. F. 1991. Physiological metrics of mental workload: A review of recent progress. In Multiple-Task-Performance, D. L. Damos, Ed. Taylor & Francis, London, 329--360.Google Scholar
- Lang, A., Bolls, P., Potter, R., and Kawahara, K. 1999. The effects of production pacing and arousing content on the information processing of television messages. J. Broadcast. Electro. Media 43, 451--476.Google Scholar
Cross Ref
- Lang, A., Dhillon, K., and Dong, Q. 1995. The effects of emotional arousal and valence on television viewers' cognitive capacity and memory. J. Broad. Electron. Media 39, 313--327.Google Scholar
Cross Ref
- Lee, L. L. and Dey, A. K. 2008. Lifelogging memory appliance for people with episodic memory impairment. In Proceedings of the 10th ACM International Conference on Ubiquitous Computing. ACM Press, 44--53. Google Scholar
Digital Library
- Leonhardt, S., Falck, T., and Mähönen, P. 2007. Proceedings of the 4th International Workshop on Wearable and Implantable Body Sensor Networks, Springer-Verlag.Google Scholar
- Lew, M. S., Sebe, N., Djeraba, C., and Jain, R. 2006. Content-based multimedia information retrieval: State of the art and challenges. ACM Trans. Multimedia Comput. Commun. Appl. 2, 1--19. Google Scholar
Digital Library
- Li, Y., Lee, S., Yeh, C., and Kuo, C. 2006. Semantic retrieval of multimedia. IEEE Signal Process. Mag. 23, 79--89.Google Scholar
Cross Ref
- Lie, W. and Hsu, K. 2008. Video summarization based on semantic feature analysis and user preference. In Proceedings of the IEEE International Conference on Sensor Networks, Ubiquitous, and Trustworthy Computing. IEEE, 486--491. Google Scholar
Digital Library
- McIntyre, G. and Göcke, R. 2007. The composite sensing of affect. In Affect and Emotion in Human-Computer Interaction. Lecture Notes in Computer Science, vol. 4868, C. Peter and R. Beale, Eds. Springer-Verlag. Google Scholar
Digital Library
- Millet, C., Bloch, I., Hede, P., and Moellic, P. 2005. Using relative spatial relationships to improve individual region recognition. In Proceedings of the 2nd European Workshop on the Integration of Knowledge, Semantics and Digital Media Technology (EWIMT'05). IEEE, 119--126.Google Scholar
- Money, A. and Agius, H. 2006. Are affective video summaries feasible? In Joint Proceedings of the 2005, 2006, and 2007 International Workshops at the BCS HCI Group Annual Conferences. C. Peter, R. Beale, E. Crane, L. Axelrod, and G. Blyth Eds. IRB Verlag, 142--149.Google Scholar
- Money, A. G. and Agius, H. 2005. ‘Once more, with feeling’: An emotional approach to multimedia content analysis. In Proceedings of the 9th IASTED International Conference on Internet and Multimedia Systems and Applications (IMSA'05). ACTA Press, Anaheim, CA, 436--441.Google Scholar
- Money, A. G. and Agius, H. 2008a. Feasibility of personalized affective video summaries. In Affect and Emotion in Human-Computer Interaction. Lecture Notes in Computer Science, vol. 4868, C. Peter and R. Beale, Eds. Springer-Verlag. Google Scholar
Digital Library
- Money, A. G. and Agius, H. 2008b. Video summarisation: A conceptual framework and survey of the state of the art. J. Vis. Commun. Image Represent. 19, 121--143. Google Scholar
Digital Library
- Moriyama, T. and Sakauchi, M. 2002. Video summarization based on the psychological unfolding of drama. Syst. Comput. Japan 33, 1122--1131.Google Scholar
Cross Ref
- Morrone-Strupinsky, J. V., and Depue, R. A. 2004. Differential relation of two distinct, film-induced positive emotional states to affiliative and agentic extraversion. Personal. Individ. Diff. 36, 1109--1126.Google Scholar
Cross Ref
- Naphade, R. M. and Huang, T. S. 2001. A probabilistic framework for semantic video indexing, filtering, and retrieval. IEEE Trans. Multimedia 3, 141--151. Google Scholar
Digital Library
- Nasoz, F., Alvarez, K., Lisetti, C. L., and Finkelstein, N. 2003. Emotion recognition from physiological signals for presence technologies. Int. J. Cogn. 6, 1--32.Google Scholar
- Ngo, C., Ma, Y., and Zhang, H. 2005. Video summarization and scene detection by graph modeling. IEEE Trans. Circ. Syst. Video Techn. 15, 296--305. Google Scholar
Digital Library
- Over, P., Smeaton, A. F., and Kelly, P. 2007. The TRECVID rushes summarization evaluation pilot. In Proceedings of the TVS—TRECVID BBC Rushes Summarization Workshop. Google Scholar
Digital Library
- Palomba, D. and Stegagno, L. 1993. Physiology, perceived emotion and memory: responding to film sequences. In The Structure of Emotion: Psychophysiological, Cognitive, and Clinical Aspects, N. Birbaumer and A. Ohman, Eds. Hogrefe & Huber, Toronto, 158--168.Google Scholar
- Philippot, P., Chapelle, C., and Blairy, S. 2002. Respiratory feedback in the generation of emotion. Cogn. Emot. 16, 605--627.Google Scholar
Cross Ref
- Picard, R. W. 1995. Affective Computing. Tech. rep. No. 321, MIT Media Laboratory Perceptual Computing Section, http://vismod.media.mit.edu/tech-reports/TR-321.pdf.Google Scholar
- Picard, R. W. 1997. Affective Computing. MIT Press, Cambridge, MA. Google Scholar
Digital Library
- Piferi, R. L., Kline, K. A., Younger, J., and Lawler, K. A. 2000. An alternative approach for achieving cardiovascular baseline: Viewing an aquatic video. Int. J. Psychophys. 37, 207--217.Google Scholar
Cross Ref
- Power, M. and Dalgliesh, T. 1998. Cognition and Emotion: From Order to Disorder. Psychology Press, Guildford, Surrey.Google Scholar
- Rikkard, N. S. 2004. Intense emotional responses to music: A test of the physiological arousal hypothesis Psych. Music 32, 371--388.Google Scholar
- Rui, Y., Gupta, A., and Acero, A. 2000. Automatically extracting highlights for TV Baseball programs. In Proceedings of the 8th ACM International Conference on Multimedia. ACM Press, 105--115. Google Scholar
Digital Library
- Rui, Y., Zhou, S. X., and Huang, T. S. 1999. Efficient access to video content in a unified framework. In Proceedings of the IEEE International Conference on Multimedia Computing and Systems (ICMCS'99). IEEE, 735--740. Google Scholar
Digital Library
- Scheirer, J., Fernandez, P., Klein, J. and Picard, R. J. 2002. Frustrating the user on purpose: A step toward building an affective computer. Interact. Comput. 14, 93--118.Google Scholar
Cross Ref
- Sebe, N., Cohen, I., Gevers, T., and Huang, T. S. 2005. Multimodal approaches for emotion recognition: A survey. In Proceedings of the SPIE Conference on Internet Imaging.Google Scholar
- Shipman, S., Divakaran, A., and Flynn, M. 2007. Highlight scene detection and video summarization for PVR-enabled television systems. In Proceedings of the IEEE International Conference on Consumer Electronics. IEEE, 1--2.Google Scholar
- Simon, H. A. 1982. Comments. In Affect and Cognition. C. Sydnor and S. T. Fiske, Eds. Lawrence Erlbaum Associates, Hillsdale, NJ, 333--342.Google Scholar
- Simons, R. F., Detenber, B. H., Reiss, J. E., and Shults, C. W. 2000. Image motion and context: A between- and within-subject comparison. Psychophys. 37, 706--710.Google Scholar
Cross Ref
- Smeulders, A. W. M., Worring, M., Santini, S., Gupta, A., and Jain, R. 2000. Content-based image retrieval at the end of the early years. IEEE Trans. Patt. Anal. Mach. Intell. 22, 1349--1380. Google Scholar
Digital Library
- Spiers, B. 1979. The Psychiatrist. Fawlty Towers, Series 2. BBC Television.Google Scholar
- Steinbeis, N., Koelsch, S., and Sloboda, J. A. 2006. The role of harmonic expectancy violations in musical emotions: Evidence from subjective, physiological, and neural responses. J. Cogn. Neurosci. 18, 1380--1393. Google Scholar
Digital Library
- Suziki, J., Hiroshi, N., and Hori, T. 2004. Level of interest in video clips modulates event-related potentials to auditory probes. Int. J. Psychophys. 55, 35--43.Google Scholar
Cross Ref
- Takahashi, Y., Nitta, N., and Babaguchi, N. 2005. Video summarization for large sports video archives. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME'05). IEEE, 1170--1173.Google Scholar
- Tjondronegoro, D., Chen, Y. P., and Pham, B. 2003. Sports video summarization using highlights and play-breaks. In Proceedings of the 5th ACM SIGMM International Workshop on Multimedia Information Retrieval (MIR'03). ACM Press, 201--208. Google Scholar
Digital Library
- Truong, B. T. and Venkatesh, S. 2007. Video abstraction: A systematic review and classification. ACM Trans. Multimedia Comput. Commun. Appl. 3, 1--37. Google Scholar
Digital Library
- Van Diest, I., Winters, W., Devriese, S., Vercamst, E., Han, J. N., Van de Woestijne, K. P., and Van den Bergh, O. 2001. Hyperventilation beyond fight/flight: respiratory responses during emotional imagery. Psychophys. 38, 961--968.Google Scholar
Cross Ref
- van Reekum, C. M. and Johnstone, T. 2004. Psychophysiological responses to appraisal dimensions in a computer game. Cogn. Emot. 18, 663--688.Google Scholar
Cross Ref
- Wang, H., Prendinger, H., and Igarashi, T. 2004. Communicating emotions in online chat using physiological sensors and animated text. In Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI'04). ACM Press, 1171--1174. Google Scholar
Digital Library
- Wang, T., Gao, Y., Li, J., Wang, P. P., Tong, X., Hu, W., Zhang, Y., and Li, J. 2007. THU-ICRC at rush summarization of TRECVID 2007. In Proceedings of the IEEE International Workshop on TRECVID Video Summarization. IEEE, 79--83. Google Scholar
Digital Library
- Winton, W. M., Putnam, L. E., and Krauss, R. M. 1984. Facial and autonomic manifestations of the dimensional structure of emotion. J. Exper. Soc. Psychol. 20, 195--216.Google Scholar
Cross Ref
- Wright, E. 2004. Shaun of the Dead. Universal Pictures.Google Scholar
- Xu, C., Wang, J., Wan, K., Li, Y., and Duan, L. 2006. Live sports detection based on broadcast video and Web-casting text. In Proceedings of the 14th ACM International Conference on Multimedia. ACM Press, 221--230. Google Scholar
Digital Library
Index Terms
ELVIS: Entertainment-led video summaries
Recommendations
'Mind the gap': evaluating user physiological response for multi-genre video summarisation
BCS-HCI '13: Proceedings of the 27th International BCS Human Computer Interaction ConferenceExisting video summarisation techniques are often only capable of summarising video from pre-specified content genres and are often not able to produce personalised summaries as they are not able to source relevant user specific data. Because users ...
Internal Consistency of Physiological Responses during Exposure to Emotional Stimuli using Biosensors
PECCS 2016: Proceedings of the 6th International Joint Conference on Pervasive and Embedded Computing and Communication SystemsIn biomedical engineering application, mental/physical health monitoring using biosensors has been lately noticed because bio-signal acquisition by non-invasive sensors is relatively simple as well as bio-signal is less sensitive to social/cultural ...
Affect Detection and Classification from the Non-stationary Physiological Data
ICMLA '13: Proceedings of the 2013 12th International Conference on Machine Learning and Applications - Volume 01Affect detection from physiological signals has received a great deal of attention recently. One arising challenge is that physiological measures are expected to exhibit considerable variations or non-stationarities over multiple days/sessions ...






Comments