Abstract
This article considers multimedia question answering beyond factoid and how-to questions. We are interested in searching videos for answering opinion-oriented questions that are controversial and hotly debated. Examples of questions include “Should Edward Snowden be pardoned?” and “Obamacare—unconstitutional or not?”. These questions often invoke emotional response, either positively or negatively, hence are likely to be better answered by videos than texts, due to the vivid display of emotional signals visible through facial expression and speaking tone. Nevertheless, a potential answer of duration 60s may be embedded in a video of 10min, resulting in degraded user experience compared to reading the answer in text only. Furthermore, a text-based opinion question may be short and vague, while the video answers could be verbal, less structured grammatically, and noisy because of errors in speech transcription. Direct matching of words or syntactic analysis of sentence structure, such as adopted by factoid and how-to question-answering, is unlikely to find video answers. The first problem, the answer localization, is addressed by audiovisual analysis of the emotional signals in videos for locating video segments likely expressing opinions. The second problem, questions and answers matching, is tackled by a deep architecture that nonlinearly matches text words in questions and speeches in videos. Experiments are conducted on eight controversial topics based on questions crawled from Yahoo! Answers and Internet videos from YouTube.
- Stefano Baccianella, Andrea Esuli, and Fabrizio Sebastiani. 2010. SentiWordNet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining. In Proceedings of the 7th International Conference on Language Resources and Evaluation.Google Scholar
- Damian Borth, Rongrong Ji, Tao Chen, Thomas Breuel, and Shih-Fu Chang. 2013. Large-scale visual sentiment ontology and detectors using adjective noun pairs. In Proceedings of the 21st ACM International Conference on Multimedia. Google Scholar
Digital Library
- Eric Brill, Jimmy Lin, Michele Banko, Susan Dumais, and Andrew Ng. 2001. Data-intensive question answering. In Proceedings of the 10th Text REtrieval Conference (TREC).Google Scholar
- Jinwei Cao and Jay F. Nunamaker. 2004. Question answering on lecture videos: A multifaceted approach. In Proceedings of the 4th ACM/IEEE-CS Joint Conference on Digital Libraries. Google Scholar
Digital Library
- Tao Chen, Felix X. Yu, Jiawei Chen, Yin Cui, Yan-Ying Chen, and Shih-Fu Chang. 2014. Object-based visual sentiment concept analysis and application. In Proceedings of the ACM International Conference on Multimedia. Google Scholar
Digital Library
- Tat-Seng Chua, Richang Hong, Guangda Li, and Jinhui Tang. 2009. From text question-answering to multimedia QA on web-scale media resources. In Proceedings of the 1st ACM Workshop on Large-scale Multimedia Retrieval and Mining. Google Scholar
Digital Library
- M. Everingham, J. Sivic, and A. Zisserman. 2006. “Hello! My name is . . . Buffy” -- Automatic naming of characters in TV video. In Proceedings of the British Machine Vision Conference.Google Scholar
- Ulf Hermjakob, Abdessamad Echihabi, and Daniel Marcu. 2002. Natural language based reformulation resource and web exploitation for question answering. In Proceedings of TREC 2002.Google Scholar
- Gary Kacmarcik. 2005. Multi-modal question-answering: Questions without keyboards. In Asia Federation of Natural Language Processing.Google Scholar
- Elie Khoury, Paul Gay, and Jean-Marc Odobez. 2013. Fusing matching and biometric similarity measures for face diarization in video. In Proceedings of the 3rd ACM Conference on International Conference on Multimedia Retrieval. Google Scholar
Digital Library
- Y. Lecun, L. Bottou, G. B. Orr, and K. R. Müller. 1998. Efficient backprop. In Neural Networks: Tricks of the Trade. Google Scholar
Digital Library
- Yue-Shi Lee, Yu-Chieh Wu, and Jie-Chi Yang. 2009. BVideoQA: Online English/Chinese bilingual video question answering. Journal of the American Society for Information Science and Technology 509--525. Google Scholar
Digital Library
- Guangda Li, Haojie Li, Zhaoyan Ming, Richang Hong, Sheng Tang, and Tat-Seng Chua. 2010. Question answering over community-contributed web videos. IEEE MultiMedia 46--57. Google Scholar
Digital Library
- Zhengdong Lu and Hang Li. 2013. A deep architecture for matching short texts. In Advances in Neural Information Processing Systems.Google Scholar
- Jana Machajdik and Allan Hanbury. 2010. Affective image classification using features inspired by psychology and art theory. In Proceedings of the International Conference on Multimedia. Google Scholar
Digital Library
- Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Introduction to Information Retrieval. Cambridge University Press, New York, NY. Google Scholar
Digital Library
- David Mimno, Wei Li, and Andrew McCallum. 2007. Mixtures of hierarchical topics with Pachinko allocation. In Proceedings of the 24th International Conference on Machine Learning. Google Scholar
Digital Library
- Liqiang Nie, Meng Wang, Zhengjun Zha, Guangda Li, and Tat-Seng Chua. 2011. Multimedia answering: Enriching text QA with media information. In Proceedings of the 34th International ACM Conference on Research and Development in Information Retrieval. Google Scholar
Digital Library
- Dragomir R. Radev, Hong Qi, Zhiping Zheng, Sasha Blair-Goldensohn, Zhu Zhang, Weiguo Fan, and John Prager. 2001. Mining the web for answers to natural language questions. In Proceedings of the 10th International Conference on Information and Knowledge Management. Google Scholar
Digital Library
- S. E. Robertson, S. Walker, S. Jones, M. M. Hancock-Beaulieu, and M. Gatford. 1996. Okapi at TREC-3. 109--126.Google Scholar
- Roman Rosipal and Nicole Krmer. 2006. Overview and recent advances in partial least squares. In Subspace, Latent Structure and Feature Selection Techniques. 34--51. Google Scholar
Digital Library
- Mickael Rouvier, Gregor Dupuy, Paul Gay, Elie Khoury, Teva Merlin, and Sylvain Meignier. 2013. An open-source state-of-the-art toolbox for broadcast news diarization. In INTERSPEECH.Google Scholar
- Jianbo Shi and Carlo Tomasi. 1994. Good features to track. In IEEE Conference on Computer Vision and Pattern Recognition. 593--600.Google Scholar
- Paul Viola and Michael J. Jones. 2004. Robust real-time face detection. International Journal of Computer Vision 57, 2, 137--154. Google Scholar
Digital Library
- Kai Wang, Zhaoyan Ming, and Tat-Seng Chua. 2009. A syntactic tree matching approach to finding similar questions in community-based QA services. In Proceedings of the 32nd International ACM Conference on Research and Development in Information Retrieval. Google Scholar
Digital Library
- Wei Wu, Hang Li, and Jun Xu. 2013. Learning query and document similarities from click-through bipartite graph with metadata. In Proceedings of the 6th ACM International Conference on Web Search and Data Mining. Google Scholar
Digital Library
- Yu-Chyeh Wu, Chia Hui Chang, and Yue-Shi Lee. 2004. CLVQ: Cross-language video question/answering system. In Proceedings of the IEEE 6th International Symposium on Multimedia Software Engineering. Google Scholar
Digital Library
- Yu-Chieh Wu and Jie-Chi Yang. 2008. A robust passage retrieval algorithm for video question answering. IEEE Transactions on Circuits and Systems for Video Technology 10, 1411--1421. Google Scholar
Digital Library
- Hui Yang, Tat-Seng Chua, Shuguang Wang, and Chun-Keat Koh. 2003. Structured use of external knowledge for event-based open domain question answering. In Proceedings of the 26th Annual International ACM Conference on Research and Development in Information Retrieval. Google Scholar
Digital Library
- Tom Yeh, John J. Lee, and Trevor Darrell. 2008. Photo-based question answering. In Proceedings of the 16th ACM International Conference on Multimedia. Google Scholar
Digital Library
- Wei Zhang, Lei Pang, and Chong-Wah Ngo. 2012. Snap-and-ask: Answering multimodal question by naming visual instance. In Proceedings of the 20th ACM International Conference on Multimedia. Google Scholar
Digital Library
Index Terms
Opinion Question Answering by Sentiment Clip Localization
Recommendations
Opinion-aware Answer Generation for Review-driven Question Answering in E-Commerce
CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge ManagementProduct-related question answering (QA) is an important but challenging task in E-Commerce. It leads to a great demand on automatic review-driven QA, which aims at providing instant responses towards user-posted questions based on diverse product ...
A Yes/No Answer Generator Based on Sentiment-Word Scores in Biomedical Question Answering
Background and Objective: Yes/no question answering QA in open-domain is a longstanding challenge widely studied over the last decades. However, it still requires further efforts in the biomedical domain. Yes/no QA aims at answering yes/no questions, ...
Answering opinion questions with random walks on graphs
ACL '09: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2Opinion Question Answering (Opinion QA), which aims to find the authors' sentimental opinions on a specific target, is more challenging than traditional fact-based question answering problems. To extract the opinion oriented answers, we need to consider ...








Comments