skip to main content
10.1145/2324796.2324799acmconferencesArticle/Chapter ViewAbstractPublication PagesicmrConference Proceedingsconference-collections
research-article

Beyond audio and video retrieval: towards multimedia summarization

Published:05 June 2012Publication History

ABSTRACT

Given the deluge of multimedia content that is becoming available over the Internet, it is increasingly important to be able to effectively examine and organize these large stores of information in ways that go beyond browsing or collaborative filtering. In this paper we review previous work on audio and video processing, and define the task of Topic-Oriented Multimedia Summarization (TOMS) using natural language generation: given a set of automatically extracted features from a video (such as visual concepts and ASR transcripts) a TOMS system will automatically generate a paragraph of natural language ("a recounting"), which summarizes the important information in a video belonging to a certain topic area, and provides explanations for why a video was matched and retrieved. We see this as a first step towards systems that will be able to discriminate visually similar, but semantically different videos, compare two videos and provide textual output or summarize a large number of videos at once. In this paper, we introduce our approach of solving the TOMS problem. We extract visual concept features and ASR transcription features from a given video, and develop a template-based natural language generation system to produce a textual recounting based on the extracted features. We also propose possible experimental designs for continuously evaluating and improving TOMS systems, and present results of a pilot evaluation of our initial system.

References

  1. Lei Bao, Shoou-I Yu, Zhen-zhong Lan, Arnold Overwijk, Qin Jin, Brian Langner, Michael Garbus, Susanne Burger, Florian Metze, Alexander Hauptmann. Informedia@TRECVID2011. In TRECVID2011, NIST.Google ScholarGoogle Scholar
  2. Lei Bao, Juan Cao, Yongdong Zhang, Jintao Li, Ming-yu Chen, and Alexander G. Hauptmann. 2010. Explicit and implicit concept-based video retrieval with bipartite graph propagation model. In Proc. of the international conference on Multimedia (ACM MM '10). New York, NY, USA, 939--942. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Michael G. Christel. Evaluation and User Studies with Respect to Video Summarization and Browsing. In Proc. of "Multimedia Content Analysis, Management, and Retrieval", part of the IS&T/SPIE Symposium on Electronic Imaging, San Jose, CA, January 17--19, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  4. Michael G. Christel. Automated Metadata in Multimedia Information Systems: Creation, Refinement, Use in Surrogates, and Evaluation. San Rafael, CA: Morgan and Claypool Publishers, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Quang Do, Dan Roth, Mark Sammons, Yuancheng Tu and V. G. Vinod Vydiswaran. Robust, Light-weight Approaches to compute Lexical Similarity. Computer Science Research and Technical Reports, University of Illinois-2009.Google ScholarGoogle Scholar
  6. Alexander G. Hauptmann, Michael G. Christel, Wei-Hao Lin, Bryan Maher, Jun Yang, Robert V. Baron, Guang Xiang. Clever clustering vs. simple speed-up for summarizing rushes. TVS '07 Proc. TRECVID Video Summarization Workshop, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Chiori Hori and Sadaoki Furui. Speech summarization: an approach through word extraction and a method for evaluation. IEICE Transactions on Information and Systems E87-D(1):15--25. 2004.Google ScholarGoogle Scholar
  8. Peter Kolb. Experiments on the difference between semantic similarity and relatedness. In Proceedings of the 17th Nordic Conference on Computational Linguistics - NODALIDA '09, Odense, Denmark, May 2009.Google ScholarGoogle Scholar
  9. Brian Langner and Alan Black, MOUNTAIN: A Translation-Based Approach to Natural Language Generation for Dialog Systems, In Proc. of IWSDS 2009, Irsee, Germany.Google ScholarGoogle Scholar
  10. Yingbo Li, Bernardo Merialdo. Multi-video Summarization Based on AV-MMR. In Proc. 2010 Int'l Workshop on Content-Based Multimedia Indexing, 1--6.Google ScholarGoogle Scholar
  11. Gary Marchionini, Yaxiao Song, and Robert Ferrell. Multimedia Surrogates for Video Gisting: Toward Combining Spoken Words and Imagery. Information Processing & Management 45(6), 2009, 615--630. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. NIST, Information Technology Laboratory. "2011 TRECVID Multimedia Event Detection Track," http://www.nist.gov/itl/iad/mig/med11.cfm.Google ScholarGoogle Scholar
  13. NIST, Guidelines for TRECVID 2012. http://www-nlpir.nist.gov/projects/tv2012/tv2012.html#merGoogle ScholarGoogle Scholar
  14. Ani Nenkova. Summarization evaluation for text and speech: issues and approaches. In Proc. INTERSPEECH 2006, USA.Google ScholarGoogle ScholarCross RefCross Ref
  15. Ba Tu Truong and Svetha Venkatesh. Video Abstraction: A Systematic Review and Classification. ACM Trans. Multimedia Computing, Communications, and Applications (TOMCCAP) 3(1), 2007, 1--37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Chun Chet Tan, Yu-Gang Jiang, Chong-Wah Ngo. Towards Textually Describing Complex Video Contents with Audio-Visual Concept Classifiers. In Proc. of ACM Multimedia 2011, Scottsdale, Arizona, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Yoshitaka Ushiku, Tatsuya Harada, Yasuo Kuniyashi. Understanding Images with Natural Sentences. In Proc. of ACM Multimedia 2011, Scottsdale, Arizona, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Robin Valenza, Tony Robinson, Marianne Hickey, and Roger Tucker, "Summarization of spoken audio through information extraction," In Proc. of ESCA Workshop on Accessing Information in Spoken Audio, 1999, pp.111--116.Google ScholarGoogle Scholar

Index Terms

  1. Beyond audio and video retrieval: towards multimedia summarization

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          ICMR '12: Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
          June 2012
          489 pages
          ISBN:9781450313292
          DOI:10.1145/2324796

          Copyright © 2012 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 5 June 2012

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          ICMR '12 Paper Acceptance Rate50of145submissions,34%Overall Acceptance Rate254of830submissions,31%

          Upcoming Conference

          ICMR '24
          International Conference on Multimedia Retrieval
          June 10 - 13, 2024
          Phuket , Thailand

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader