Abstract
This article conducts user evaluation to study the performance difference between interactive and automatic search. Particularly, the study aims to provide empirical insights of how the performance landscape of video search changes, with tens of thousands of concept detectors freely available to exploit for query formulation. We compare three types of search modes: free-to-play (i.e., search from scratch), non-free-to-play (i.e., search by inspecting results provided by automatic search), and automatic search including concept-free and concept-based retrieval paradigms. The study involves a total of 40 participants; each performs interactive search over 15 queries of various difficulty levels using two search modes on the IACC.3 dataset provided by TRECVid organizers. The study suggests that the performance of automatic search is still far behind interactive search. Furthermore, providing users with the result of automatic search for exploration does not show obvious advantage over asking users to search from scratch. The study also analyzes user behavior to reveal insights of how users compose queries, browse results, and discover new query terms for search, which can serve as guideline for future research of both interactive and automatic search.
- Pradeep K. Atrey, M. Anwar Hossain, Abdulmotaleb El Saddik, and Mohan S. Kankanhalli. 2010. Multimodal fusion for multimedia analysis: A survey. Multimedia Systems 16, 6 (01 Nov 2010), 345--379. DOI:https://doi.org/10.1007/s00530-010-0182-0 Google Scholar
Digital Library
- George Awad, Asad A. Butt, Keith Curtis, Yooyoung Lee, Jonathan G. Fiscus, Afzal Godil, Andrew Delgado, Jesse Zhang, Eliot Godard, Lukas L. Diduch, Alan F. Smeaton, Yvette Graham, Wessel Kraaij, and Georges Quénot. 2019. TRECVID 2019: An evaluation campaign to benchmark Video Activity Detection, Video Captioning and Matching, and Video Search & retrieval. In 2019 TREC Video Retrieval Evaluation (TRECVID’19). National Institute of Standards and Technology (NIST). Retrieved from https://www-nlpir.nist.gov/projects/tvpubs/tv19.papers/tv19overview.pdf.Google Scholar
- George Awad, Asad A. Butt, Keith Curtis, Yooyoung Lee, Jonathan G. Fiscus, Afzal Godil, David Joy, Andrew Delgado, Alan F. Smeaton, Yvette Graham, Wessel Kraaij, Georges Quénot, João Magalhães, David Semedo, and Saverio G. Blasi. 2018. TRECVID 2018: Benchmarking video activity detection, video captioning and matching, video storytelling linking and video search. In 2018 TREC Video Retrieval Evaluation (TRECVID’18). National Institute of Standards and Technology (NIST). Retrieved from https://www-nlpir.nist.gov/projects/tvpubs/tv18.papers/tv18overview.pdfGoogle Scholar
- George Awad, Asad A. Butt, Jonathan G. Fiscus, David Joy, Andrew Delgado, Martial Michel, Alan F. Smeaton, Yvette Graham, Gareth J. F. Jones, Wessel Kraaij, Georges Quénot, Maria Eskevich, Roeland Ordelman, and Benoit Huet (Eds.). 2017. 2017 TREC Video Retrieval Evaluation (TRECVID’17). National Institute of Standards and Technology (NIST). Retrieved from https://www-nlpir.nist.gov/projects/tvpubs/tv.pubs.17.org.html.Google Scholar
- George Awad, Asad A. Butt, Jonathan G. Fiscus, David Joy, Andrew Delgado, Martial Michel, Alan F. Smeaton, Yvette Graham, Gareth J. F. Jones, Wessel Kraaij, Georges Quénot, Maria Eskevich, Roeland Ordelman, and Benoit Huet. 2017. TRECVID 2017: Evaluating ad-hoc and instance video search, events detection, video captioning and hyperlinking. In 2017 TREC Video Retrieval Evaluation (TRECVID’17). National Institute of Standards and Technology (NIST). Retrieved from https://www-nlpir.nist.gov/projects/tvpubs/tv17.papers/tv17overview.pdf.Google Scholar
- George Awad, Jonathan G. Fiscus, David Joy, Martial Michel, Alan F. Smeaton, Wessel Kraaij, Georges Quénot, Maria Eskevich, Robin Aly, Roeland Ordelman, Marc Ritter, Gareth J. F. Jones, Benoit Huet, and Martha A. Larson. 2016. TRECVID 2016: Evaluating video search, video event detection, localization, and hyperlinking. In 2016 TREC Video Retrieval Evaluation (TRECVID’16). National Institute of Standards and Technology (NIST). Retrieved from https://www-nlpir.nist.gov/projects/tvpubs/tv16.papers/tv16overview.pdfGoogle Scholar
- Ricardo A. Baeza-Yates and Berthier Ribeiro-Neto. 1999. Modern Information Retrieval. Addison-Wesley Longman Publishing Co., Inc. Google Scholar
Digital Library
- Kai Uwe Barthel, Nico Hezel, and Klaus Jung. 2018. Fusing keyword search and visual exploration for untagged videos. In MultiMedia Modeling, Klaus Schoeffmann, Thanarat H. Chalidabhongse, Chong Wah Ngo, Supavadee Aramvith, Noel E. O’Connor, Yo-Sung Ho, Moncef Gabbouj, and Ahmed Elgammal (Eds.). Springer International Publishing, Cham, 413--418.Google Scholar
- Kai Uwe Barthel, Nico Hezel, and Radek Mackowiak. 2016. Navigating a graph of scenes for exploring large video collections. In MultiMedia Modeling, Qi Tian, Nicu Sebe, Guo-Jun Qi, Benoit Huet, Richang Hong, and Xueliang Liu (Eds.). Springer International Publishing, Cham, 418--423.Google Scholar
- Rodrigo Benenson, Stefan Popov, and Vittorio Ferrari. 2019. Large-scale interactive object segmentation with human annotators. CoRR abs/1903.10830 (2019). arxiv:1903.10830 http://arxiv.org/abs/1903.10830.Google Scholar
- Adam Blažek, Jakub Lokoč, Filip Matzner, and Tomáš Skopal. 2015. Enhanced signature-based video browser. In MultiMedia Modeling, Xiangjian He, Suhuai Luo, Dacheng Tao, Changsheng Xu, Jie Yang, and Muhammad Abul Hasan (Eds.). Springer International Publishing, Cham, 243--248.Google Scholar
- Maaike H. T. De Boer, Yi-Jie Lu, Hao Zhang, Klamer Schutte, Chong-Wah Ngo, and Wessel Kraaij. 2017. Semantic reasoning in zero example video event retrieval. ACM Transactions on Multimedia Computing, Communications, and Applications 13, 4, Article 60 (Oct. 2017), 17 pages. DOI:https://doi.org/10.1145/3131288 Google Scholar
Digital Library
- Claudio Carpineto and Giovanni Romano. 2012. A survey of automatic query expansion in information retrieval. ACM Comput. Surv. 44, 1, Article Article 1 (Jan. 2012), 50 pages. DOI:https://doi.org/10.1145/2071389.2071390 Google Scholar
Digital Library
- Claudiu Cobâ¢rzan, Klaus Schoeffmann, Werner Bailer, Wolfgang Hürst, Adam Blažek, Jakub Lokoč, Stefanos Vrochidis, Kai Barthel, and Luca Rossetto. 2017. Interactive video search tools: A detailed analysis of the video browser showdown 2015. Multimedia Tools and Applications 76, 4 (01 Feb 2017), 5539--5571. https://doi.org/10.1007/s11042-016-3661-2 Google Scholar
Digital Library
- Maaike de Boer, Klamer Schutte, and Wessel Kraaij. 2016. Knowledge based query expansion in complex multimedia event detection. Multimedia Tools and Applications 75, 15 (2016), 9025--9043. DOI:https://doi.org/10.1007/s11042-015-2757-4 Google Scholar
Digital Library
- Maaike de Boer, Klamer Schutte, and Wessel Kraaij. 2016. Knowledge based query expansion in complex multimedia event detection. Multimedia Tools and Applications 75, 15 (2016), 9025--9043. DOI:https://doi.org/10.1007/s11042-015-2757-4 Google Scholar
Digital Library
- Maaike H. T. de Boer, Klamer Schutte, Hao Zhang, Yi-Jie Lu, Chong-Wah Ngo, and Wessel Kraaij. 2016. Blind late fusion in multimedia event retrieval. International Journal of Multimedia Information Retrieval 5, 4 (2016), 203--217. DOI:https://doi.org/10.1007/s13735-016-0112-9Google Scholar
Cross Ref
- J. Dong, X. Li, and C. G. M. Snoek. 2018. Predicting visual features from text for image and video caption retrieval. IEEE Transactions on Multimedia 20, 12 (Dec. 2018), 3377--3388. DOI:https://doi.org/10.1109/TMM.2018.2832602Google Scholar
Digital Library
- Danny Francis, Bernard Mérialdo, and Benoit Huet. 2017. EURECOM at TRECVID 2017: The adhoc video search. In 2017 TREC Video Retrieval Evaluation (TRECVID’17). National Institute of Standards and Technology (NIST). Retrieved from https://www-nlpir.nist.gov/projects/tvpubs/tv17.papers/eurecom.pdf.Google Scholar
- A. Habibian, T. Mensink, and C. G. M. Snoek. 2017. Video2vec embeddings recognize events when examples are scarce. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 10 (Oct. 2017), 2089--2103. DOI:https://doi.org/10.1109/TPAMI.2016.2627563Google Scholar
Digital Library
- Alexander Hauptmann, R. Yan, W. Lin, M. Christel, and H. Wactlar. 2007. Can high-level concepts fill the semantic gap in video retrieval? A case study with broadcast news. IEEE Transactions on Multimedia 9, 5 (2007), 958--966. Google Scholar
Digital Library
- Alexander Hauptmann, Rong Yan, and Wei-Hao Lin. 2007. How many high-level concepts will fill the semantic gap in news video retrieval? Proceedings of the 6th ACM International Conference on Image and Video Retrieval (CIVR’07), 627--634. DOI:https://doi.org/10.1145/1282280.1282369 Google Scholar
Digital Library
- K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 770--778. DOI:https://doi.org/10.1109/CVPR.2016.90Google Scholar
Cross Ref
- F. C. Heilbron, V. Escorcia, B. Ghanem, and J. C. Niebles. 2015. ActivityNet: A large-scale video benchmark for human activity understanding. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 961--970. DOI:https://doi.org/10.1109/CVPR.2015.7298698Google Scholar
Cross Ref
- Lu Jiang, Shoou-I. Yu, Deyu Meng, Teruko Mitamura, and Alexander G. Hauptmann. 2015. Bridging the ultimate semantic gap: A semantic search engine for internet videos. In Proceedings of the 5th ACM on International Conference on Multimedia Retrieval (ICMR’15). Association for Computing Machinery, New York, NY, 27--34. DOI:https://doi.org/10.1145/2671188.2749399 Google Scholar
Digital Library
- Yu-Gang Jiang, Zuxuan Wu, Jun Wang, Xiangyang Xue, and Shih-Fu Chang. 2018. Exploiting feature and class relationships in video categorization with regularized deep neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 2 (2018), 352--364. DOI:https://doi.org/10.1109/TPAMI.2017.2670560 Google Scholar
Digital Library
- Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Li Fei-Fei. 2014. Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'14). 1725--1732. https://doi.org/10.1109/CVPR.2014.223 Google Scholar
Digital Library
- Will Kay, João Carreira, Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Viola, Tim Green, Trevor Back, Paul Natsev, Mustafa Suleyman, and Andrew Zisserman. 2017. The kinetics human action video dataset. CoRR abs/1705.06950 (2017). arxiv:1705.06950 http://arxiv.org/abs/1705.06950Google Scholar
- Andreas Leibetseder, Sabrina Kletz, and Klaus Schoeffmann. 2018. Sketch-based similarity search for collaborative feature maps. In MultiMedia Modeling, Klaus Schoeffmann, Thanarat H. Chalidabhongse, Chong Wah Ngo, Supavadee Aramvith, Noel E. O’Connor, Yo-Sung Ho, Moncef Gabbouj, and Ahmed Elgammal (Eds.). Springer International Publishing, Cham, 425--430.Google Scholar
- Xirong Li, Chaoxi Xu, Gang Yang, Zhineng Chen, and Jianfeng Dong. 2019. W2VV++: Fully deep learning for ad-hoc video search. In Proceedings of the 27th ACM International Conference on Multimedia (MM’19). Association for Computing Machinery, New York, NY, 1786--1794. DOI:https://doi.org/10.1145/3343031.3350906 Google Scholar
Digital Library
- Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common objects in context. In Computer Vision (ECCV’14), David Fleet, Tomas Pajdla, Bernt Schiele, and Tinne Tuytelaars (Eds.). Springer International Publishing, Cham, 740--755.Google Scholar
- Ying Liu, Dengsheng Zhang, Guojun Lu, and Wei-Ying Ma. 2007. A survey of content-based image retrieval with high-level semantics. Pattern Recognition 40, 1 (Jan. 2007), 262--282. DOI:https://doi.org/10.1016/j.patcog.2006.04.045 Google Scholar
Digital Library
- Jakub Lokoč, Adam Blažek, and Tomáš Skopal. 2014. Signature-based video browser. In MultiMedia Modeling, Cathal Gurrin, Frank Hopfgartner, Wolfgang Hurst, Håvard Johansen, Hyowon Lee, and Noel O’Connor (Eds.). Springer International Publishing, Cham, 415--418. Google Scholar
Digital Library
- Jakub Lokoč, Gregor Kovalčík, and Tomáš Souček. 2018. Revisiting SIRET video retrieval tool. In MultiMedia Modeling, Klaus Schoeffmann, Thanarat H. Chalidabhongse, Chong Wah Ngo, Supavadee Aramvith, Noel E. O’Connor, Yo-Sung Ho, Moncef Gabbouj, and Ahmed Elgammal (Eds.). Springer International Publishing, Cham, 419--424.Google Scholar
- Jakub Lokoč, Anh Nguyen Phuong, Marta Vomlelová, and Chong-Wah Ngo. 2017. Color-sketch simulator: A guide for color-based visual known-item search. In Advanced Data Mining and Applications, Gao Cong, Wen-Chih Peng, Wei Emma Zhang, Chengliang Li, and Aixin Sun (Eds.). Springer International Publishing, Cham, 754--763.Google Scholar
- J. Lokoč, W. Bailer, K. Schoeffmann, B. Muenzer, and G. Awad. 2018. On influential trends in interactive video retrieval: Video browser showdown 2015-2017. IEEE Transactions on Multimedia 20, 12 (Dec. 2018), 3361--3376. DOI:https://doi.org/10.1109/TMM.2018.2830110Google Scholar
Digital Library
- Jakub Lokoč, Gregor Kovalčík, Bernd Münzer, Klaus Schöffmann, Werner Bailer, Ralph Gasser, Stefanos Vrochidis, Phuong Anh Nguyen, Sitapa Rujikietgumjorn, and Kai Uwe Barthel. 2019. Interactive search or sequential browsing? A detailed analysis of the video browser showdown 2018. ACM Transactions on Multimedia Computing, Communications, and Applications 15, 1, Article 29 (Feb. 2019), 18 pages. DOI:https://doi.org/10.1145/3295663 Google Scholar
Digital Library
- Yi-Jie Lu, Phuong Anh Nguyen, Hao Zhang, and Chong-Wah Ngo. 2017. Concept-based interactive search system. In MultiMedia Modeling, Laurent Amsaleg, Gylfi Þór Guðmundsson, Cathal Gurrin, Björn Þór Jónsson, and Shin’ichi Satoh (Eds.). Springer International Publishing, Cham, 463--468.Google Scholar
- Yi-Jie Lu, Hao Zhang, Maaike de Boer, and Chong-Wah Ngo. 2016. Event detection with zero example: Select the right and suppress the wrong concepts. In Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval (ICMR’16). ACM, New York, NY, 127--134. DOI:https://doi.org/10.1145/2911996.2912015 Google Scholar
Digital Library
- Foteini Markatopoulou, Anastasia Moumtzidou, Damianos Galanopoulos, Konstantinos Avgerinakis, Stelios Andreadis, Ilias Gialampoukidis, Stavros Tachos, Stefanos Vrochidis, Vasileios Mezaris, Ioannis Kompatsiaris, and Ioannis Patras. 2017. ITI-CERTH participation in TRECVID 2017. In 2017 TREC Video Retrieval Evaluation (TRECVID’17). National Institute of Standards and Technology (NIST). Retrieved from https://www-nlpir.nist.gov/projects/tvpubs/tv17.papers/iti_certh.pdf.Google Scholar
- Pascal Mettes, Dennis C. Koelma, and Cees G. M. Snoek. 2016. The ImageNet shuffle: Reorganized pre-training for video event detection. In Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval (ICMR’16). ACM, New York, NY, 175--182. Retrieved from https://doi.org/10.1145/2911996.2912036. Google Scholar
Digital Library
- Tomás Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. In 1st International Conference on Learning Representations (ICLR'13), Scottsdale, Arizona, USA, May 2-4, 2013, Workshop Track Proceedings. http://arxiv.org/abs/1301.3781.Google Scholar
- Anastasia Moumtzidou, Stelios Andreadis, Foteini Markatopoulou, Damianos Galanopoulos, Ilias Gialampoukidis, Stefanos Vrochidis, Vasileios Mezaris, Ioannis Kompatsiaris, and Ioannis Patras. 2018. VERGE in VBS 2018. In MultiMedia Modeling, Klaus Schoeffmann, Thanarat H. Chalidabhongse, Chong Wah Ngo, Supavadee Aramvith, Noel E. O’Connor, Yo-Sung Ho, Moncef Gabbouj, and Ahmed Elgammal (Eds.). Springer International Publishing, Cham, 444--450.Google Scholar
- Milind Naphade, J. R. Smith, Jelena Tešić, S. Chang, Winston Hsu, Lyndon Kennedy, Alexander Hauptmann, and Jon Curtis. 2006. Large-scale concept ontology for multimedia. IEEE Multimedia 13 (2006), 86--91. DOI:https://doi.org/10.1109/MMUL.2006.63 Google Scholar
Digital Library
- Apostol (Paul) Natsev, Alexander Haubold, Jelena Tešić, Lexing Xie, and Rong Yan. 2007. Semantic concept-based query expansion and re-ranking for multimedia retrieval. In Proceedings of the 15th ACM International Conference on Multimedia (MM’07). Association for Computing Machinery, New York, NY, 991--1000. DOI:https://doi.org/10.1145/1291233.1291448 Google Scholar
Digital Library
- Phuong Anh Nguyen, Qing Li, Zhi-Qi Cheng, Yi-Jie Lu, Hao Zhang, Xiao Wu, and Chong-Wah Ngo. 2017. VIREO @ TRECVID 2017: Video-to-text, ad-hoc video search, and video hyperlinking. In 2017 TREC Video Retrieval Evaluation (TRECVID’17). National Institute of Standards and Technology (NIST). Retrieved from https://www-nlpir.nist.gov/projects/tvpubs/tv17.papers/vireo.pdf.Google Scholar
- Phuong Anh Nguyen, Yi-Jie Lu, Hao Zhang, and Chong-Wah Ngo. 2018. Enhanced VIREO KIS at VBS 2018. In MultiMedia Modeling, Klaus Schoeffmann, Thanarat H. Chalidabhongse, Chong Wah Ngo, Supavadee Aramvith, Noel E. O’Connor, Yo-Sung Ho, Moncef Gabbouj, and Ahmed Elgammal (Eds.). Springer International Publishing, Cham, 407--412.Google Scholar
- Phuong Anh Nguyen, Chong-Wah Ngo, Danny Francis, and Benoit Huet. 2019. VIREO @ video browser showdown 2019. In MultiMedia Modeling, Ioannis Kompatsiaris, Benoit Huet, Vasileios Mezaris, Cathal Gurrin, Wen-Huang Cheng, and Stefanos Vrochidis (Eds.). Springer International Publishing, Cham, 609--615.Google Scholar
- Paul Over, George Awad, Alan F. Smeaton, Colum Foley, and James Lanagan. 2009. Creating a web-scale video collection for research. In Proceedings of the 1st Workshop on Web-Scale Multimedia Corpus (Beijing, China) (WSMC'09). Association for Computing Machinery, New York, NY, USA, 25--32. https://doi.org/10.1145/1631135.1631141 Google Scholar
Digital Library
- Sang Phan, Martin Klinkigt, Vinh-Tiep Nguyen, Tien-Dung Mai, Andreu Girbau Xalabarder, Ryota Hinami, Benjamin Renoust, Thanh Duc Ngo, Minh-Triet Tran, Yuki Watanabe, Atsushi Hiroike, Duc Anh Duong, Duy-Dinh Le, Yusuke Miyao, and Shin’ichi Satoh. 2017. NII-HITACHI-UIT at TRECVID 2017. In 2017 TREC Video Retrieval Evaluation, (TRECVID’17). National Institute of Standards and Technology (NIST). Retrieved from https://www-nlpir.nist.gov/projects/tvpubs/tv17.papers/nii_hitachi_uit.pdf.Google Scholar
- Manfred Jürgen Primus, Bernd Münzer, Andreas Leibetseder, and Klaus Schoeffmann. 2018. The ITEC collaborative video search system at the video browser showdown 2018. In MultiMedia Modeling, Klaus Schoeffmann, Thanarat H. Chalidabhongse, Chong Wah Ngo, Supavadee Aramvith, Noel E. O’Connor, Yo-Sung Ho, Moncef Gabbouj, and Ahmed Elgammal (Eds.). Springer International Publishing, Cham, 438--443.Google Scholar
- Luca Rossetto, Mahnaz Amiri Parian, Ralph Gasser, Ivan Giangreco, Silvan Heller, and Heiko Schuldt. 2019. Deep learning-based concept detection in vitrivr. In MultiMedia Modeling, Ioannis Kompatsiaris, Benoit Huet, Vasileios Mezaris, Cathal Gurrin, Wen-Huang Cheng, and Stefanos Vrochidis (Eds.). Springer International Publishing, Cham, 616--621.Google Scholar
- Luca Rossetto, Ivan Giangreco, Ralph Gasser, and Heiko Schuldt. 2018. Competitive video retrieval with vitrivr. In MultiMedia Modeling, Klaus Schoeffmann, Thanarat H. Chalidabhongse, Chong Wah Ngo, Supavadee Aramvith, Noel E. O’Connor, Yo-Sung Ho, Moncef Gabbouj, and Ahmed Elgammal (Eds.). Springer International Publishing, Cham, 403--406.Google Scholar
- Luca Rossetto, Ivan Giangreco, Claudiu Tănase, Heiko Schuldt, Stéphane Dupont, and Omar Seddati. 2017. Enhanced retrieval and browsing in the IMOTION system. In MultiMedia Modeling, Laurent Amsaleg, Gylfi Þór Guðmundsson, Cathal Gurrin, Björn Þór Jónsson, and Shin’ichi Satoh (Eds.). Springer International Publishing, Cham, 469--474.Google Scholar
- Luca Rossetto, Andreas Leibetseder, Stefanos Vrochidis, Ralph Gasser, Jakub Lokoc, Werner Bailer, Klaus Schoeffmann, Bernd Münzer, Tomas Soucek, Phuong Nguyen, and Paolo Bolettieri. 2020. Interactive video retrieval in the age of deep learning - Detailed evaluation of VBS 2019. IEEE Transactions on Multimedia PP (Mar. 2020), 1--1. DOI:https://doi.org/10.1109/TMM.2020.2980944Google Scholar
- Sitapa Rujikietgumjorn, Nattachai Watcharapinchai, and Sanparith Marukatat. 2018. Sloth search system. In MultiMedia Modeling, Klaus Schoeffmann, Thanarat H. Chalidabhongse, Chong Wah Ngo, Supavadee Aramvith, Noel E. O’Connor, Yo-Sung Ho, Moncef Gabbouj, and Ahmed Elgammal (Eds.). Springer International Publishing, Cham, 431--437.Google Scholar
- Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael S. Bernstein, Alexander C. Berg, and Fei-Fei Li. 2014. ImageNet large scale visual recognition challenge. CoRR abs/1409.0575 (2014). arxiv:1409.0575 http://arxiv.org/abs/1409.0575Google Scholar
- K. Schoeffmann. 2019. Video browser showdown 2012-2019: A review. In 2019 International Conference on Content-Based Multimedia Indexing (CBMI’19). 1--4.Google Scholar
Cross Ref
- Cees G. M. Snoek, Xirong Li, Chaoxi Xu, and Dennis C. Koelma. 2017. University of Amsterdam and Renmin University at TRECVID 2017: Searching video, detecting events and describing video. In 2017 TREC Video Retrieval Evaluation (TRECVID’17). National Institute of Standards and Technology (NIST). Retrieved from https://www-nlpir.nist.gov/projects/tvpubs/tv17.papers/mediamill.pdf.Google Scholar
- Cees G. M. Snoek and Marcel Worring. 2009. Concept-Based Video Retrieval. Now Publishers Inc., Hanover, MA. Google Scholar
Digital Library
- C. G. M. Snoek, M. Worring, O. d. Rooij, K. E. A. van de Sande, R. Yan, and A. G. Hauptmann. 2008. VideOlympics: Real-time evaluation of multimedia retrieval systems. IEEE MultiMedia 15, 1 (Jan. 2008), 86--91. DOI:https://doi.org/10.1109/MMUL.2008.21 Google Scholar
Digital Library
- Christiane Fellbaum (Ed.). 1998. WordNet: An Electronic Lexical Database. MIT Press, Cambridge, MA.Google Scholar
- Khurram Soomro, Amir Roshan Zamir, and Mubarak Shah. 2012. UCF101: A dataset of 101 human actions classes from videos in the wild. CoRR abs/1212.0402 (2012). arxiv:1212.0402 http://arxiv.org/abs/1212.0402Google Scholar
- Stephanie Strassel, Amanda J. Morris, Jonathan G. Fiscus, Christopher Caruso, Haejoong Lee, Paul Over, James Fiumara, Barbara Shaw, Brian Antonishek, and Martial Michel. 2012. Creating HAVIC: Heterogeneous audio visual internet collection. In LREC.Google Scholar
- Thanh-Dat Truong, Vinh-Tiep Nguyen, Minh-Triet Tran, Trang-Vinh Trieu, Tien Do, Thanh Duc Ngo, and Dinh-Duy Le. 2018. Video search based on semantic extraction and locally regional object proposal. In MultiMedia Modeling, Klaus Schoeffmann, Thanarat H. Chalidabhongse, Chong Wah Ngo, Supavadee Aramvith, Noel E. O’Connor, Yo-Sung Ho, Moncef Gabbouj, and Ahmed Elgammal (Eds.). Springer International Publishing, Cham, 451--456.Google Scholar
- Christos Tzelepis, Damianos Galanopoulos, Vasileios Mezaris, and Ioannis Patras. 2016. Learning to detect video events from zero or very few video examples. Image Vision Computing 53, C (Sept. 2016), 35--44. DOI:https://doi.org/10.1016/j.imavis.2015.09.005 Google Scholar
Digital Library
- Kazuya Ueki, Koji Hirakawa, Kotaro Kikuchi, Tetsuji Ogawa, and Tetsunori Kobayashi. 2017. Waseda_Meisei at TRECVID 2017: Ad-hoc video search. In 2017 TREC Video Retrieval Evaluation (TRECVID’17). National Institute of Standards and Technology (NIST). Retrieved from https://www-nlpir.nist.gov/projects/tvpubs/tv17.papers/waseda_meisei.pdf.Google Scholar
- Kazuya Ueki, Kotaro Kikuchi, Susumu Saito, and Tetsunori Kobayashi. 2016. Waseda at TRECVID 2016: Ad-hoc video search. In 2016 TREC Video Retrieval Evaluation (TRECVID’16). National Institute of Standards and Technology (NIST). Retrieved from https://www-nlpir.nist.gov/projects/tvpubs/tv16.papers/waseda.pdf.Google Scholar
- Dong Wang, Xirong Li, Jianmin Li, and Bo Zhang. 2007. The importance of query-concept-mapping for automatic video retrieval. In Proceedings of the 15th ACM International Conference on Multimedia (MM’07). Association for Computing Machinery, New York, NY, 285--288. DOI:https://doi.org/10.1145/1291233.1291293 Google Scholar
Digital Library
- Xiao-Yong Wei and Chong-Wah Ngo. 2007. Ontology-enriched semantic space for video search. In Proceedings of the 15th ACM International Conference on Multimedia (MM’07). Association for Computing Machinery, New York, NY, 981--990. DOI:https://doi.org/10.1145/1291233.1291447 Google Scholar
Digital Library
- Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. 2017. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1492--1500.Google Scholar
Cross Ref
- Emine Yilmaz, Evangelos Kanoulas, and Javed A. Aslam. 2008. A simple and efficient sampling method for estimating AP and NDCG. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’08). ACM, New York, NY, 603--610. DOI:https://doi.org/10.1145/1390334.1390437 Google Scholar
Digital Library
- Wei Zhang, Hao Zhang, Ting Yao, Yijie Lu, Jingjing Chen, and Chong-Wah Ngo. 2014. VIREO @ TRECVID 2014: Instance search and semantic indexing. In 2014 TREC Video Retrieval Evaluation (TRECVID’14). National Institute of Standards and Technology (NIST). Retrieved from https://www-nlpir.nist.gov/projects/tvpubs/tv14.papers/vireo.pdf.Google Scholar
- B. Zhou, A. Lapedriza, A. Khosla, A. Oliva, and A. Torralba. 2018. Places: A 10 million image database for scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 6 (2018), 1452--1464. https://doi.org/10.1109/TPAMI.2017.2723009Google Scholar
Cross Ref
Index Terms
Interactive Search vs. Automatic Search: An Extensive Study on Video Retrieval
Recommendations
Fast Instance Search Based on Approximate Bichromatic Reverse Nearest Neighbor Search
MM '14: Proceedings of the 22nd ACM international conference on MultimediaIn the TRECVID Instance Search (INS) task, it is known that use of BM25, which is an improvement of the TFIDF,greatly improves retrieval performance. Its calculation, however, requires tremendous amount of computational cost and this fact makes its use ...
Merging storyboard strategies and automatic retrieval for improving interactive video search
CIVR '07: Proceedings of the 6th ACM international conference on Image and video retrievalThe Carnegie Mellon University Informedia group has enjoyed consistent success with TRECVID interactive search using traditional storyboard interfaces for shot-based retrieval. For TRECVID 2006 the output of automatic search was included for the first ...
Depth-first vs best-first search
AAAI'91: Proceedings of the ninth National conference on Artificial intelligence - Volume 1We present a comparison of three well known heuristic search algorithms: best-first search (BFS), iterative-deepening (ID), and depth-first branch-and-bound (DFBB). We develop a model to analyze the time and space complexity of these three algorithms in ...






Comments