ABSTRACT
Given the deluge of multimedia content that is becoming available over the Internet, it is increasingly important to be able to effectively examine and organize these large stores of information in ways that go beyond browsing or collaborative filtering. In this paper we review previous work on audio and video processing, and define the task of Topic-Oriented Multimedia Summarization (TOMS) using natural language generation: given a set of automatically extracted features from a video (such as visual concepts and ASR transcripts) a TOMS system will automatically generate a paragraph of natural language ("a recounting"), which summarizes the important information in a video belonging to a certain topic area, and provides explanations for why a video was matched and retrieved. We see this as a first step towards systems that will be able to discriminate visually similar, but semantically different videos, compare two videos and provide textual output or summarize a large number of videos at once. In this paper, we introduce our approach of solving the TOMS problem. We extract visual concept features and ASR transcription features from a given video, and develop a template-based natural language generation system to produce a textual recounting based on the extracted features. We also propose possible experimental designs for continuously evaluating and improving TOMS systems, and present results of a pilot evaluation of our initial system.
- Lei Bao, Shoou-I Yu, Zhen-zhong Lan, Arnold Overwijk, Qin Jin, Brian Langner, Michael Garbus, Susanne Burger, Florian Metze, Alexander Hauptmann. Informedia@TRECVID2011. In TRECVID2011, NIST.Google Scholar
- Lei Bao, Juan Cao, Yongdong Zhang, Jintao Li, Ming-yu Chen, and Alexander G. Hauptmann. 2010. Explicit and implicit concept-based video retrieval with bipartite graph propagation model. In Proc. of the international conference on Multimedia (ACM MM '10). New York, NY, USA, 939--942. Google Scholar
Digital Library
- Michael G. Christel. Evaluation and User Studies with Respect to Video Summarization and Browsing. In Proc. of "Multimedia Content Analysis, Management, and Retrieval", part of the IS&T/SPIE Symposium on Electronic Imaging, San Jose, CA, January 17--19, 2006.Google Scholar
Cross Ref
- Michael G. Christel. Automated Metadata in Multimedia Information Systems: Creation, Refinement, Use in Surrogates, and Evaluation. San Rafael, CA: Morgan and Claypool Publishers, 2009. Google Scholar
Digital Library
- Quang Do, Dan Roth, Mark Sammons, Yuancheng Tu and V. G. Vinod Vydiswaran. Robust, Light-weight Approaches to compute Lexical Similarity. Computer Science Research and Technical Reports, University of Illinois-2009.Google Scholar
- Alexander G. Hauptmann, Michael G. Christel, Wei-Hao Lin, Bryan Maher, Jun Yang, Robert V. Baron, Guang Xiang. Clever clustering vs. simple speed-up for summarizing rushes. TVS '07 Proc. TRECVID Video Summarization Workshop, 2007. Google Scholar
Digital Library
- Chiori Hori and Sadaoki Furui. Speech summarization: an approach through word extraction and a method for evaluation. IEICE Transactions on Information and Systems E87-D(1):15--25. 2004.Google Scholar
- Peter Kolb. Experiments on the difference between semantic similarity and relatedness. In Proceedings of the 17th Nordic Conference on Computational Linguistics - NODALIDA '09, Odense, Denmark, May 2009.Google Scholar
- Brian Langner and Alan Black, MOUNTAIN: A Translation-Based Approach to Natural Language Generation for Dialog Systems, In Proc. of IWSDS 2009, Irsee, Germany.Google Scholar
- Yingbo Li, Bernardo Merialdo. Multi-video Summarization Based on AV-MMR. In Proc. 2010 Int'l Workshop on Content-Based Multimedia Indexing, 1--6.Google Scholar
- Gary Marchionini, Yaxiao Song, and Robert Ferrell. Multimedia Surrogates for Video Gisting: Toward Combining Spoken Words and Imagery. Information Processing & Management 45(6), 2009, 615--630. Google Scholar
Digital Library
- NIST, Information Technology Laboratory. "2011 TRECVID Multimedia Event Detection Track," http://www.nist.gov/itl/iad/mig/med11.cfm.Google Scholar
- NIST, Guidelines for TRECVID 2012. http://www-nlpir.nist.gov/projects/tv2012/tv2012.html#merGoogle Scholar
- Ani Nenkova. Summarization evaluation for text and speech: issues and approaches. In Proc. INTERSPEECH 2006, USA.Google Scholar
Cross Ref
- Ba Tu Truong and Svetha Venkatesh. Video Abstraction: A Systematic Review and Classification. ACM Trans. Multimedia Computing, Communications, and Applications (TOMCCAP) 3(1), 2007, 1--37. Google Scholar
Digital Library
- Chun Chet Tan, Yu-Gang Jiang, Chong-Wah Ngo. Towards Textually Describing Complex Video Contents with Audio-Visual Concept Classifiers. In Proc. of ACM Multimedia 2011, Scottsdale, Arizona, USA. Google Scholar
Digital Library
- Yoshitaka Ushiku, Tatsuya Harada, Yasuo Kuniyashi. Understanding Images with Natural Sentences. In Proc. of ACM Multimedia 2011, Scottsdale, Arizona, USA. Google Scholar
Digital Library
- Robin Valenza, Tony Robinson, Marianne Hickey, and Roger Tucker, "Summarization of spoken audio through information extraction," In Proc. of ESCA Workshop on Accessing Information in Spoken Audio, 1999, pp.111--116.Google Scholar
Index Terms
Beyond audio and video retrieval: towards multimedia summarization
Recommendations
A topic-driven language model for learning to generate diverse sentences
AbstractGenerating diverse sentences under a topic is a meaningful, yet not well-solved task in the field of natural language processing. We present a neural language model for generating diverse sentences conditioned on a given topic ...
PLVCG: A Pretraining Based Model for Live Video Comment Generation
Advances in Knowledge Discovery and Data MiningAbstractLive video comment generating task aims to automatically generate real-time viewer comments on videos like real viewers do. Like providing search suggestions by search engines, this task can help viewers find comments they want to post by ...
Rushes video summarization using audio-visual information and sequence alignment
TVS '08: Proceedings of the 2nd ACM TRECVid Video Summarization WorkshopThis paper describes our system and methodologies for the BBC rushes video summarization task of TRECVID 2008. The procedure of the system is composed of three major steps: shot detection, irrelevant and repetitive subshot removal, and final summary ...




Comments