skip to main content
research-article

Interactive Search vs. Automatic Search: An Extensive Study on Video Retrieval

Published:11 May 2021Publication History
Skip Abstract Section

Abstract

This article conducts user evaluation to study the performance difference between interactive and automatic search. Particularly, the study aims to provide empirical insights of how the performance landscape of video search changes, with tens of thousands of concept detectors freely available to exploit for query formulation. We compare three types of search modes: free-to-play (i.e., search from scratch), non-free-to-play (i.e., search by inspecting results provided by automatic search), and automatic search including concept-free and concept-based retrieval paradigms. The study involves a total of 40 participants; each performs interactive search over 15 queries of various difficulty levels using two search modes on the IACC.3 dataset provided by TRECVid organizers. The study suggests that the performance of automatic search is still far behind interactive search. Furthermore, providing users with the result of automatic search for exploration does not show obvious advantage over asking users to search from scratch. The study also analyzes user behavior to reveal insights of how users compose queries, browse results, and discover new query terms for search, which can serve as guideline for future research of both interactive and automatic search.

References

  1. Pradeep K. Atrey, M. Anwar Hossain, Abdulmotaleb El Saddik, and Mohan S. Kankanhalli. 2010. Multimodal fusion for multimedia analysis: A survey. Multimedia Systems 16, 6 (01 Nov 2010), 345--379. DOI:https://doi.org/10.1007/s00530-010-0182-0 Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. George Awad, Asad A. Butt, Keith Curtis, Yooyoung Lee, Jonathan G. Fiscus, Afzal Godil, Andrew Delgado, Jesse Zhang, Eliot Godard, Lukas L. Diduch, Alan F. Smeaton, Yvette Graham, Wessel Kraaij, and Georges Quénot. 2019. TRECVID 2019: An evaluation campaign to benchmark Video Activity Detection, Video Captioning and Matching, and Video Search & retrieval. In 2019 TREC Video Retrieval Evaluation (TRECVID’19). National Institute of Standards and Technology (NIST). Retrieved from https://www-nlpir.nist.gov/projects/tvpubs/tv19.papers/tv19overview.pdf.Google ScholarGoogle Scholar
  3. George Awad, Asad A. Butt, Keith Curtis, Yooyoung Lee, Jonathan G. Fiscus, Afzal Godil, David Joy, Andrew Delgado, Alan F. Smeaton, Yvette Graham, Wessel Kraaij, Georges Quénot, João Magalhães, David Semedo, and Saverio G. Blasi. 2018. TRECVID 2018: Benchmarking video activity detection, video captioning and matching, video storytelling linking and video search. In 2018 TREC Video Retrieval Evaluation (TRECVID’18). National Institute of Standards and Technology (NIST). Retrieved from https://www-nlpir.nist.gov/projects/tvpubs/tv18.papers/tv18overview.pdfGoogle ScholarGoogle Scholar
  4. George Awad, Asad A. Butt, Jonathan G. Fiscus, David Joy, Andrew Delgado, Martial Michel, Alan F. Smeaton, Yvette Graham, Gareth J. F. Jones, Wessel Kraaij, Georges Quénot, Maria Eskevich, Roeland Ordelman, and Benoit Huet (Eds.). 2017. 2017 TREC Video Retrieval Evaluation (TRECVID’17). National Institute of Standards and Technology (NIST). Retrieved from https://www-nlpir.nist.gov/projects/tvpubs/tv.pubs.17.org.html.Google ScholarGoogle Scholar
  5. George Awad, Asad A. Butt, Jonathan G. Fiscus, David Joy, Andrew Delgado, Martial Michel, Alan F. Smeaton, Yvette Graham, Gareth J. F. Jones, Wessel Kraaij, Georges Quénot, Maria Eskevich, Roeland Ordelman, and Benoit Huet. 2017. TRECVID 2017: Evaluating ad-hoc and instance video search, events detection, video captioning and hyperlinking. In 2017 TREC Video Retrieval Evaluation (TRECVID’17). National Institute of Standards and Technology (NIST). Retrieved from https://www-nlpir.nist.gov/projects/tvpubs/tv17.papers/tv17overview.pdf.Google ScholarGoogle Scholar
  6. George Awad, Jonathan G. Fiscus, David Joy, Martial Michel, Alan F. Smeaton, Wessel Kraaij, Georges Quénot, Maria Eskevich, Robin Aly, Roeland Ordelman, Marc Ritter, Gareth J. F. Jones, Benoit Huet, and Martha A. Larson. 2016. TRECVID 2016: Evaluating video search, video event detection, localization, and hyperlinking. In 2016 TREC Video Retrieval Evaluation (TRECVID’16). National Institute of Standards and Technology (NIST). Retrieved from https://www-nlpir.nist.gov/projects/tvpubs/tv16.papers/tv16overview.pdfGoogle ScholarGoogle Scholar
  7. Ricardo A. Baeza-Yates and Berthier Ribeiro-Neto. 1999. Modern Information Retrieval. Addison-Wesley Longman Publishing Co., Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Kai Uwe Barthel, Nico Hezel, and Klaus Jung. 2018. Fusing keyword search and visual exploration for untagged videos. In MultiMedia Modeling, Klaus Schoeffmann, Thanarat H. Chalidabhongse, Chong Wah Ngo, Supavadee Aramvith, Noel E. O’Connor, Yo-Sung Ho, Moncef Gabbouj, and Ahmed Elgammal (Eds.). Springer International Publishing, Cham, 413--418.Google ScholarGoogle Scholar
  9. Kai Uwe Barthel, Nico Hezel, and Radek Mackowiak. 2016. Navigating a graph of scenes for exploring large video collections. In MultiMedia Modeling, Qi Tian, Nicu Sebe, Guo-Jun Qi, Benoit Huet, Richang Hong, and Xueliang Liu (Eds.). Springer International Publishing, Cham, 418--423.Google ScholarGoogle Scholar
  10. Rodrigo Benenson, Stefan Popov, and Vittorio Ferrari. 2019. Large-scale interactive object segmentation with human annotators. CoRR abs/1903.10830 (2019). arxiv:1903.10830 http://arxiv.org/abs/1903.10830.Google ScholarGoogle Scholar
  11. Adam Blažek, Jakub Lokoč, Filip Matzner, and Tomáš Skopal. 2015. Enhanced signature-based video browser. In MultiMedia Modeling, Xiangjian He, Suhuai Luo, Dacheng Tao, Changsheng Xu, Jie Yang, and Muhammad Abul Hasan (Eds.). Springer International Publishing, Cham, 243--248.Google ScholarGoogle Scholar
  12. Maaike H. T. De Boer, Yi-Jie Lu, Hao Zhang, Klamer Schutte, Chong-Wah Ngo, and Wessel Kraaij. 2017. Semantic reasoning in zero example video event retrieval. ACM Transactions on Multimedia Computing, Communications, and Applications 13, 4, Article 60 (Oct. 2017), 17 pages. DOI:https://doi.org/10.1145/3131288 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Claudio Carpineto and Giovanni Romano. 2012. A survey of automatic query expansion in information retrieval. ACM Comput. Surv. 44, 1, Article Article 1 (Jan. 2012), 50 pages. DOI:https://doi.org/10.1145/2071389.2071390 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Claudiu Cobâ¢rzan, Klaus Schoeffmann, Werner Bailer, Wolfgang Hürst, Adam Blažek, Jakub Lokoč, Stefanos Vrochidis, Kai Barthel, and Luca Rossetto. 2017. Interactive video search tools: A detailed analysis of the video browser showdown 2015. Multimedia Tools and Applications 76, 4 (01 Feb 2017), 5539--5571. https://doi.org/10.1007/s11042-016-3661-2 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Maaike de Boer, Klamer Schutte, and Wessel Kraaij. 2016. Knowledge based query expansion in complex multimedia event detection. Multimedia Tools and Applications 75, 15 (2016), 9025--9043. DOI:https://doi.org/10.1007/s11042-015-2757-4 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Maaike de Boer, Klamer Schutte, and Wessel Kraaij. 2016. Knowledge based query expansion in complex multimedia event detection. Multimedia Tools and Applications 75, 15 (2016), 9025--9043. DOI:https://doi.org/10.1007/s11042-015-2757-4 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Maaike H. T. de Boer, Klamer Schutte, Hao Zhang, Yi-Jie Lu, Chong-Wah Ngo, and Wessel Kraaij. 2016. Blind late fusion in multimedia event retrieval. International Journal of Multimedia Information Retrieval 5, 4 (2016), 203--217. DOI:https://doi.org/10.1007/s13735-016-0112-9Google ScholarGoogle ScholarCross RefCross Ref
  18. J. Dong, X. Li, and C. G. M. Snoek. 2018. Predicting visual features from text for image and video caption retrieval. IEEE Transactions on Multimedia 20, 12 (Dec. 2018), 3377--3388. DOI:https://doi.org/10.1109/TMM.2018.2832602Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Danny Francis, Bernard Mérialdo, and Benoit Huet. 2017. EURECOM at TRECVID 2017: The adhoc video search. In 2017 TREC Video Retrieval Evaluation (TRECVID’17). National Institute of Standards and Technology (NIST). Retrieved from https://www-nlpir.nist.gov/projects/tvpubs/tv17.papers/eurecom.pdf.Google ScholarGoogle Scholar
  20. A. Habibian, T. Mensink, and C. G. M. Snoek. 2017. Video2vec embeddings recognize events when examples are scarce. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 10 (Oct. 2017), 2089--2103. DOI:https://doi.org/10.1109/TPAMI.2016.2627563Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Alexander Hauptmann, R. Yan, W. Lin, M. Christel, and H. Wactlar. 2007. Can high-level concepts fill the semantic gap in video retrieval? A case study with broadcast news. IEEE Transactions on Multimedia 9, 5 (2007), 958--966. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Alexander Hauptmann, Rong Yan, and Wei-Hao Lin. 2007. How many high-level concepts will fill the semantic gap in news video retrieval? Proceedings of the 6th ACM International Conference on Image and Video Retrieval (CIVR’07), 627--634. DOI:https://doi.org/10.1145/1282280.1282369 Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 770--778. DOI:https://doi.org/10.1109/CVPR.2016.90Google ScholarGoogle ScholarCross RefCross Ref
  24. F. C. Heilbron, V. Escorcia, B. Ghanem, and J. C. Niebles. 2015. ActivityNet: A large-scale video benchmark for human activity understanding. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 961--970. DOI:https://doi.org/10.1109/CVPR.2015.7298698Google ScholarGoogle ScholarCross RefCross Ref
  25. Lu Jiang, Shoou-I. Yu, Deyu Meng, Teruko Mitamura, and Alexander G. Hauptmann. 2015. Bridging the ultimate semantic gap: A semantic search engine for internet videos. In Proceedings of the 5th ACM on International Conference on Multimedia Retrieval (ICMR’15). Association for Computing Machinery, New York, NY, 27--34. DOI:https://doi.org/10.1145/2671188.2749399 Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Yu-Gang Jiang, Zuxuan Wu, Jun Wang, Xiangyang Xue, and Shih-Fu Chang. 2018. Exploiting feature and class relationships in video categorization with regularized deep neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 2 (2018), 352--364. DOI:https://doi.org/10.1109/TPAMI.2017.2670560 Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Li Fei-Fei. 2014. Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'14). 1725--1732. https://doi.org/10.1109/CVPR.2014.223 Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Will Kay, João Carreira, Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Viola, Tim Green, Trevor Back, Paul Natsev, Mustafa Suleyman, and Andrew Zisserman. 2017. The kinetics human action video dataset. CoRR abs/1705.06950 (2017). arxiv:1705.06950 http://arxiv.org/abs/1705.06950Google ScholarGoogle Scholar
  29. Andreas Leibetseder, Sabrina Kletz, and Klaus Schoeffmann. 2018. Sketch-based similarity search for collaborative feature maps. In MultiMedia Modeling, Klaus Schoeffmann, Thanarat H. Chalidabhongse, Chong Wah Ngo, Supavadee Aramvith, Noel E. O’Connor, Yo-Sung Ho, Moncef Gabbouj, and Ahmed Elgammal (Eds.). Springer International Publishing, Cham, 425--430.Google ScholarGoogle Scholar
  30. Xirong Li, Chaoxi Xu, Gang Yang, Zhineng Chen, and Jianfeng Dong. 2019. W2VV++: Fully deep learning for ad-hoc video search. In Proceedings of the 27th ACM International Conference on Multimedia (MM’19). Association for Computing Machinery, New York, NY, 1786--1794. DOI:https://doi.org/10.1145/3343031.3350906 Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common objects in context. In Computer Vision (ECCV’14), David Fleet, Tomas Pajdla, Bernt Schiele, and Tinne Tuytelaars (Eds.). Springer International Publishing, Cham, 740--755.Google ScholarGoogle Scholar
  32. Ying Liu, Dengsheng Zhang, Guojun Lu, and Wei-Ying Ma. 2007. A survey of content-based image retrieval with high-level semantics. Pattern Recognition 40, 1 (Jan. 2007), 262--282. DOI:https://doi.org/10.1016/j.patcog.2006.04.045 Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Jakub Lokoč, Adam Blažek, and Tomáš Skopal. 2014. Signature-based video browser. In MultiMedia Modeling, Cathal Gurrin, Frank Hopfgartner, Wolfgang Hurst, Håvard Johansen, Hyowon Lee, and Noel O’Connor (Eds.). Springer International Publishing, Cham, 415--418. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Jakub Lokoč, Gregor Kovalčík, and Tomáš Souček. 2018. Revisiting SIRET video retrieval tool. In MultiMedia Modeling, Klaus Schoeffmann, Thanarat H. Chalidabhongse, Chong Wah Ngo, Supavadee Aramvith, Noel E. O’Connor, Yo-Sung Ho, Moncef Gabbouj, and Ahmed Elgammal (Eds.). Springer International Publishing, Cham, 419--424.Google ScholarGoogle Scholar
  35. Jakub Lokoč, Anh Nguyen Phuong, Marta Vomlelová, and Chong-Wah Ngo. 2017. Color-sketch simulator: A guide for color-based visual known-item search. In Advanced Data Mining and Applications, Gao Cong, Wen-Chih Peng, Wei Emma Zhang, Chengliang Li, and Aixin Sun (Eds.). Springer International Publishing, Cham, 754--763.Google ScholarGoogle Scholar
  36. J. Lokoč, W. Bailer, K. Schoeffmann, B. Muenzer, and G. Awad. 2018. On influential trends in interactive video retrieval: Video browser showdown 2015-2017. IEEE Transactions on Multimedia 20, 12 (Dec. 2018), 3361--3376. DOI:https://doi.org/10.1109/TMM.2018.2830110Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Jakub Lokoč, Gregor Kovalčík, Bernd Münzer, Klaus Schöffmann, Werner Bailer, Ralph Gasser, Stefanos Vrochidis, Phuong Anh Nguyen, Sitapa Rujikietgumjorn, and Kai Uwe Barthel. 2019. Interactive search or sequential browsing? A detailed analysis of the video browser showdown 2018. ACM Transactions on Multimedia Computing, Communications, and Applications 15, 1, Article 29 (Feb. 2019), 18 pages. DOI:https://doi.org/10.1145/3295663 Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Yi-Jie Lu, Phuong Anh Nguyen, Hao Zhang, and Chong-Wah Ngo. 2017. Concept-based interactive search system. In MultiMedia Modeling, Laurent Amsaleg, Gylfi Þór Guðmundsson, Cathal Gurrin, Björn Þór Jónsson, and Shin’ichi Satoh (Eds.). Springer International Publishing, Cham, 463--468.Google ScholarGoogle Scholar
  39. Yi-Jie Lu, Hao Zhang, Maaike de Boer, and Chong-Wah Ngo. 2016. Event detection with zero example: Select the right and suppress the wrong concepts. In Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval (ICMR’16). ACM, New York, NY, 127--134. DOI:https://doi.org/10.1145/2911996.2912015 Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Foteini Markatopoulou, Anastasia Moumtzidou, Damianos Galanopoulos, Konstantinos Avgerinakis, Stelios Andreadis, Ilias Gialampoukidis, Stavros Tachos, Stefanos Vrochidis, Vasileios Mezaris, Ioannis Kompatsiaris, and Ioannis Patras. 2017. ITI-CERTH participation in TRECVID 2017. In 2017 TREC Video Retrieval Evaluation (TRECVID’17). National Institute of Standards and Technology (NIST). Retrieved from https://www-nlpir.nist.gov/projects/tvpubs/tv17.papers/iti_certh.pdf.Google ScholarGoogle Scholar
  41. Pascal Mettes, Dennis C. Koelma, and Cees G. M. Snoek. 2016. The ImageNet shuffle: Reorganized pre-training for video event detection. In Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval (ICMR’16). ACM, New York, NY, 175--182. Retrieved from https://doi.org/10.1145/2911996.2912036. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Tomás Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. In 1st International Conference on Learning Representations (ICLR'13), Scottsdale, Arizona, USA, May 2-4, 2013, Workshop Track Proceedings. http://arxiv.org/abs/1301.3781.Google ScholarGoogle Scholar
  43. Anastasia Moumtzidou, Stelios Andreadis, Foteini Markatopoulou, Damianos Galanopoulos, Ilias Gialampoukidis, Stefanos Vrochidis, Vasileios Mezaris, Ioannis Kompatsiaris, and Ioannis Patras. 2018. VERGE in VBS 2018. In MultiMedia Modeling, Klaus Schoeffmann, Thanarat H. Chalidabhongse, Chong Wah Ngo, Supavadee Aramvith, Noel E. O’Connor, Yo-Sung Ho, Moncef Gabbouj, and Ahmed Elgammal (Eds.). Springer International Publishing, Cham, 444--450.Google ScholarGoogle Scholar
  44. Milind Naphade, J. R. Smith, Jelena Tešić, S. Chang, Winston Hsu, Lyndon Kennedy, Alexander Hauptmann, and Jon Curtis. 2006. Large-scale concept ontology for multimedia. IEEE Multimedia 13 (2006), 86--91. DOI:https://doi.org/10.1109/MMUL.2006.63 Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Apostol (Paul) Natsev, Alexander Haubold, Jelena Tešić, Lexing Xie, and Rong Yan. 2007. Semantic concept-based query expansion and re-ranking for multimedia retrieval. In Proceedings of the 15th ACM International Conference on Multimedia (MM’07). Association for Computing Machinery, New York, NY, 991--1000. DOI:https://doi.org/10.1145/1291233.1291448 Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Phuong Anh Nguyen, Qing Li, Zhi-Qi Cheng, Yi-Jie Lu, Hao Zhang, Xiao Wu, and Chong-Wah Ngo. 2017. VIREO @ TRECVID 2017: Video-to-text, ad-hoc video search, and video hyperlinking. In 2017 TREC Video Retrieval Evaluation (TRECVID’17). National Institute of Standards and Technology (NIST). Retrieved from https://www-nlpir.nist.gov/projects/tvpubs/tv17.papers/vireo.pdf.Google ScholarGoogle Scholar
  47. Phuong Anh Nguyen, Yi-Jie Lu, Hao Zhang, and Chong-Wah Ngo. 2018. Enhanced VIREO KIS at VBS 2018. In MultiMedia Modeling, Klaus Schoeffmann, Thanarat H. Chalidabhongse, Chong Wah Ngo, Supavadee Aramvith, Noel E. O’Connor, Yo-Sung Ho, Moncef Gabbouj, and Ahmed Elgammal (Eds.). Springer International Publishing, Cham, 407--412.Google ScholarGoogle Scholar
  48. Phuong Anh Nguyen, Chong-Wah Ngo, Danny Francis, and Benoit Huet. 2019. VIREO @ video browser showdown 2019. In MultiMedia Modeling, Ioannis Kompatsiaris, Benoit Huet, Vasileios Mezaris, Cathal Gurrin, Wen-Huang Cheng, and Stefanos Vrochidis (Eds.). Springer International Publishing, Cham, 609--615.Google ScholarGoogle Scholar
  49. Paul Over, George Awad, Alan F. Smeaton, Colum Foley, and James Lanagan. 2009. Creating a web-scale video collection for research. In Proceedings of the 1st Workshop on Web-Scale Multimedia Corpus (Beijing, China) (WSMC'09). Association for Computing Machinery, New York, NY, USA, 25--32. https://doi.org/10.1145/1631135.1631141 Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Sang Phan, Martin Klinkigt, Vinh-Tiep Nguyen, Tien-Dung Mai, Andreu Girbau Xalabarder, Ryota Hinami, Benjamin Renoust, Thanh Duc Ngo, Minh-Triet Tran, Yuki Watanabe, Atsushi Hiroike, Duc Anh Duong, Duy-Dinh Le, Yusuke Miyao, and Shin’ichi Satoh. 2017. NII-HITACHI-UIT at TRECVID 2017. In 2017 TREC Video Retrieval Evaluation, (TRECVID’17). National Institute of Standards and Technology (NIST). Retrieved from https://www-nlpir.nist.gov/projects/tvpubs/tv17.papers/nii_hitachi_uit.pdf.Google ScholarGoogle Scholar
  51. Manfred Jürgen Primus, Bernd Münzer, Andreas Leibetseder, and Klaus Schoeffmann. 2018. The ITEC collaborative video search system at the video browser showdown 2018. In MultiMedia Modeling, Klaus Schoeffmann, Thanarat H. Chalidabhongse, Chong Wah Ngo, Supavadee Aramvith, Noel E. O’Connor, Yo-Sung Ho, Moncef Gabbouj, and Ahmed Elgammal (Eds.). Springer International Publishing, Cham, 438--443.Google ScholarGoogle Scholar
  52. Luca Rossetto, Mahnaz Amiri Parian, Ralph Gasser, Ivan Giangreco, Silvan Heller, and Heiko Schuldt. 2019. Deep learning-based concept detection in vitrivr. In MultiMedia Modeling, Ioannis Kompatsiaris, Benoit Huet, Vasileios Mezaris, Cathal Gurrin, Wen-Huang Cheng, and Stefanos Vrochidis (Eds.). Springer International Publishing, Cham, 616--621.Google ScholarGoogle Scholar
  53. Luca Rossetto, Ivan Giangreco, Ralph Gasser, and Heiko Schuldt. 2018. Competitive video retrieval with vitrivr. In MultiMedia Modeling, Klaus Schoeffmann, Thanarat H. Chalidabhongse, Chong Wah Ngo, Supavadee Aramvith, Noel E. O’Connor, Yo-Sung Ho, Moncef Gabbouj, and Ahmed Elgammal (Eds.). Springer International Publishing, Cham, 403--406.Google ScholarGoogle Scholar
  54. Luca Rossetto, Ivan Giangreco, Claudiu Tănase, Heiko Schuldt, Stéphane Dupont, and Omar Seddati. 2017. Enhanced retrieval and browsing in the IMOTION system. In MultiMedia Modeling, Laurent Amsaleg, Gylfi Þór Guðmundsson, Cathal Gurrin, Björn Þór Jónsson, and Shin’ichi Satoh (Eds.). Springer International Publishing, Cham, 469--474.Google ScholarGoogle Scholar
  55. Luca Rossetto, Andreas Leibetseder, Stefanos Vrochidis, Ralph Gasser, Jakub Lokoc, Werner Bailer, Klaus Schoeffmann, Bernd Münzer, Tomas Soucek, Phuong Nguyen, and Paolo Bolettieri. 2020. Interactive video retrieval in the age of deep learning - Detailed evaluation of VBS 2019. IEEE Transactions on Multimedia PP (Mar. 2020), 1--1. DOI:https://doi.org/10.1109/TMM.2020.2980944Google ScholarGoogle Scholar
  56. Sitapa Rujikietgumjorn, Nattachai Watcharapinchai, and Sanparith Marukatat. 2018. Sloth search system. In MultiMedia Modeling, Klaus Schoeffmann, Thanarat H. Chalidabhongse, Chong Wah Ngo, Supavadee Aramvith, Noel E. O’Connor, Yo-Sung Ho, Moncef Gabbouj, and Ahmed Elgammal (Eds.). Springer International Publishing, Cham, 431--437.Google ScholarGoogle Scholar
  57. Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael S. Bernstein, Alexander C. Berg, and Fei-Fei Li. 2014. ImageNet large scale visual recognition challenge. CoRR abs/1409.0575 (2014). arxiv:1409.0575 http://arxiv.org/abs/1409.0575Google ScholarGoogle Scholar
  58. K. Schoeffmann. 2019. Video browser showdown 2012-2019: A review. In 2019 International Conference on Content-Based Multimedia Indexing (CBMI’19). 1--4.Google ScholarGoogle ScholarCross RefCross Ref
  59. Cees G. M. Snoek, Xirong Li, Chaoxi Xu, and Dennis C. Koelma. 2017. University of Amsterdam and Renmin University at TRECVID 2017: Searching video, detecting events and describing video. In 2017 TREC Video Retrieval Evaluation (TRECVID’17). National Institute of Standards and Technology (NIST). Retrieved from https://www-nlpir.nist.gov/projects/tvpubs/tv17.papers/mediamill.pdf.Google ScholarGoogle Scholar
  60. Cees G. M. Snoek and Marcel Worring. 2009. Concept-Based Video Retrieval. Now Publishers Inc., Hanover, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. C. G. M. Snoek, M. Worring, O. d. Rooij, K. E. A. van de Sande, R. Yan, and A. G. Hauptmann. 2008. VideOlympics: Real-time evaluation of multimedia retrieval systems. IEEE MultiMedia 15, 1 (Jan. 2008), 86--91. DOI:https://doi.org/10.1109/MMUL.2008.21 Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Christiane Fellbaum (Ed.). 1998. WordNet: An Electronic Lexical Database. MIT Press, Cambridge, MA.Google ScholarGoogle Scholar
  63. Khurram Soomro, Amir Roshan Zamir, and Mubarak Shah. 2012. UCF101: A dataset of 101 human actions classes from videos in the wild. CoRR abs/1212.0402 (2012). arxiv:1212.0402 http://arxiv.org/abs/1212.0402Google ScholarGoogle Scholar
  64. Stephanie Strassel, Amanda J. Morris, Jonathan G. Fiscus, Christopher Caruso, Haejoong Lee, Paul Over, James Fiumara, Barbara Shaw, Brian Antonishek, and Martial Michel. 2012. Creating HAVIC: Heterogeneous audio visual internet collection. In LREC.Google ScholarGoogle Scholar
  65. Thanh-Dat Truong, Vinh-Tiep Nguyen, Minh-Triet Tran, Trang-Vinh Trieu, Tien Do, Thanh Duc Ngo, and Dinh-Duy Le. 2018. Video search based on semantic extraction and locally regional object proposal. In MultiMedia Modeling, Klaus Schoeffmann, Thanarat H. Chalidabhongse, Chong Wah Ngo, Supavadee Aramvith, Noel E. O’Connor, Yo-Sung Ho, Moncef Gabbouj, and Ahmed Elgammal (Eds.). Springer International Publishing, Cham, 451--456.Google ScholarGoogle Scholar
  66. Christos Tzelepis, Damianos Galanopoulos, Vasileios Mezaris, and Ioannis Patras. 2016. Learning to detect video events from zero or very few video examples. Image Vision Computing 53, C (Sept. 2016), 35--44. DOI:https://doi.org/10.1016/j.imavis.2015.09.005 Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Kazuya Ueki, Koji Hirakawa, Kotaro Kikuchi, Tetsuji Ogawa, and Tetsunori Kobayashi. 2017. Waseda_Meisei at TRECVID 2017: Ad-hoc video search. In 2017 TREC Video Retrieval Evaluation (TRECVID’17). National Institute of Standards and Technology (NIST). Retrieved from https://www-nlpir.nist.gov/projects/tvpubs/tv17.papers/waseda_meisei.pdf.Google ScholarGoogle Scholar
  68. Kazuya Ueki, Kotaro Kikuchi, Susumu Saito, and Tetsunori Kobayashi. 2016. Waseda at TRECVID 2016: Ad-hoc video search. In 2016 TREC Video Retrieval Evaluation (TRECVID’16). National Institute of Standards and Technology (NIST). Retrieved from https://www-nlpir.nist.gov/projects/tvpubs/tv16.papers/waseda.pdf.Google ScholarGoogle Scholar
  69. Dong Wang, Xirong Li, Jianmin Li, and Bo Zhang. 2007. The importance of query-concept-mapping for automatic video retrieval. In Proceedings of the 15th ACM International Conference on Multimedia (MM’07). Association for Computing Machinery, New York, NY, 285--288. DOI:https://doi.org/10.1145/1291233.1291293 Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Xiao-Yong Wei and Chong-Wah Ngo. 2007. Ontology-enriched semantic space for video search. In Proceedings of the 15th ACM International Conference on Multimedia (MM’07). Association for Computing Machinery, New York, NY, 981--990. DOI:https://doi.org/10.1145/1291233.1291447 Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. 2017. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1492--1500.Google ScholarGoogle ScholarCross RefCross Ref
  72. Emine Yilmaz, Evangelos Kanoulas, and Javed A. Aslam. 2008. A simple and efficient sampling method for estimating AP and NDCG. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’08). ACM, New York, NY, 603--610. DOI:https://doi.org/10.1145/1390334.1390437 Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Wei Zhang, Hao Zhang, Ting Yao, Yijie Lu, Jingjing Chen, and Chong-Wah Ngo. 2014. VIREO @ TRECVID 2014: Instance search and semantic indexing. In 2014 TREC Video Retrieval Evaluation (TRECVID’14). National Institute of Standards and Technology (NIST). Retrieved from https://www-nlpir.nist.gov/projects/tvpubs/tv14.papers/vireo.pdf.Google ScholarGoogle Scholar
  74. B. Zhou, A. Lapedriza, A. Khosla, A. Oliva, and A. Torralba. 2018. Places: A 10 million image database for scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 6 (2018), 1452--1464. https://doi.org/10.1109/TPAMI.2017.2723009Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Interactive Search vs. Automatic Search: An Extensive Study on Video Retrieval

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!