skip to main content
research-article

Lifelog Image Retrieval Based on Semantic Relevance Mapping

Authors Info & Claims
Published:22 July 2021Publication History
Skip Abstract Section

Abstract

Lifelog analytics is an emerging research area with technologies embracing the latest advances in machine learning, wearable computing, and data analytics. However, state-of-the-art technologies are still inadequate to distill voluminous multimodal lifelog data into high quality insights. In this article, we propose a novel semantic relevance mapping (SRM) method to tackle the problem of lifelog information access. We formulate lifelog image retrieval as a series of mapping processes where a semantic gap exists for relating basic semantic attributes with high-level query topics. The SRM serves both as a formalism to construct a trainable model to bridge the semantic gap and an algorithm to implement the training process on real-world lifelog data. Based on the SRM, we propose a computational framework of lifelog analytics to support various applications of lifelog information access, such as image retrieval, summarization, and insight visualization. Systematic evaluations are performed on three challenging benchmarking tasks to show the effectiveness of our method.

References

  1. F. B. Abdallah, G. Feki, A. B. Ammar, and C. B. Amar. 2018. A new model driven architecture for deep learning-based multimodal lifelog retrieval. In ICCE Computer Graphics, Visualization and Computer Vision. 1–10.Google ScholarGoogle Scholar
  2. Fatma Ben Abdallah, Ghada Feki, Mohamed Ezzarka, et al.2018. Regim lab team at ImageCLEF lifelog moment retrieval task 2018. In Working Notes of CLEF 2018.Google ScholarGoogle Scholar
  3. Peter Anderson, Xiaodong He, Chris Buehler, et al.2018. Bottom-up and top-down attention for image captioning and visual question answering. In CVPR. 6077–6086.Google ScholarGoogle Scholar
  4. I. Androutsopoulos, G. D. Ritchie, and Peter Thanisch. 1995. Natural language interfaces to databases—An introduction. Natural Language Engineering 1 (March 1995), 29–81.Google ScholarGoogle Scholar
  5. Jonathan Berant, Andrew Chou, Roy Frostig, and Percy S. Liang. 2013. Semantic parsing on freebase from question-answer pairs. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 1533–1544.Google ScholarGoogle Scholar
  6. M. Bolaños, M. Dimiccoli, and P. Radeva. 2017. Toward storytelling from visual lifelogging: An overview. IEEE Transactions on Human-Machine Systems 47 (2017), 77–90.Google ScholarGoogle Scholar
  7. Marc Bolaños, Ricard Mestre, Estefanía Talavera, et al.2015. Visual summary of egocentric photostreams by representative keyframes. In IEEE 1st International Workshop on Wearable and Ego-Vision Systems for Augmented Experience (WEsAX’15). ICME. 1–6.Google ScholarGoogle Scholar
  8. Yuri Boykov and Vladimir Kolmogorov. 2004. An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Transactions on Pattern Analysis and Machine Intelligence 26, 9 (2004), 1124–1137. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Shih-Fu Chang. 2013. How far we’ve come: Impact of 20 years of multimedia information retrieval. ACM Transactions on Multimedia Computing, Communications and Applications 9 (2013), 42:1–42:4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Yi Chen and Gareth J. F. Jones. 2010. Augmenting human memory using personal lifelogs. In ACM AH’10. Article 24, 9 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. E. K. Choe, B. Lee, and M. C. Schraefel. 2015. Characterizing visualization insights from quantified selfers’ personal data presentations. IEEE Computer Graphics and Applications 35, 4 (2015), 28–37.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. D.-T. Dang-Nguyen, L. Piras, M. Riegler, G. Boato, L. Zhou, and C. Gurrin. 2017. Overview of ImageCLEFlifelog 2017: Lifelog retrieval and summarization. In Working Notes of CLEF 2017. 1–14.Google ScholarGoogle Scholar
  13. Duc-Tien Dang-Nguyen, Luca Piras, Michael Riegler, Liting Zhou, Mathias Lux, and Cathal Gurrin. 2018. Overview of imagecleflifelog 2018: Daily living understanding and lifelog moment retrieval. In Working Notes of CLEF 2018.Google ScholarGoogle Scholar
  14. A. G. del Molino, M. Bappaditya, J. Lin, J.-H. Lim, S. Vigneshwaran, and V. Chandrasekhar. 2017. VC-I2R at ImageCLEF2017: Ensemble of deep learned features for lifelog video summarization. In Working Notes of CLEF 2017. 1–12.Google ScholarGoogle Scholar
  15. A. G. del Molino, Joo-Hwee Lim, and Ah-Hwee Tan. 2018. Predicting visual context for unsupervised event segmentation in continuous photo-streams. In Proceedings of the 26th ACM International Conference on Multimedia (MM’18). 10–17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Deng, W. Dong, R. Socher, L. Li, et al. 2009. ImageNet: A large-scale hierarchical image database. In CVPR. 248–255.Google ScholarGoogle Scholar
  17. M. Dimiccolia, M. Bolanos, E. Talaveraa, M. Aghaeia, S. G. Nikolovd, and P. Radeva. 2017. SR-Clustering: Semantic regularized clustering for egocentric photo streams segmentation. Computer Vision and Image Understanding 155 (2017), 55–69.Google ScholarGoogle ScholarCross RefCross Ref
  18. Thanh-Toan Do, Tuan Hoang, Dang-Khoa Le Tan, and Ngai-Man Cheung. 2019. From selective deep convolutional features to compact binary representations for image retrieval. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP) 15 (2019), 43:1–27:22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Mihai Dogariu and Bogdan Ionescu. 2017. A textual filtering of hog-based hierarchical clustering of lifelog data. In Working Notes of CLEF 2017.Google ScholarGoogle Scholar
  20. A. Duane, R. Gupta, L. Zhou, and C. Gurrin. 2016. Visual insights from personal lifelogs. In Proceedings of the 12th NTCIR Conference on Evaluation of Information Access Technologies (NTCIR-12). 386–389.Google ScholarGoogle Scholar
  21. C. Gurrin, H. Joho, F. Hopfgartner, et al.2017. Overview of NTCIR-13 Lifelog-2 task. In The 13th NTCIR Conference (NTCIR-13). 6–11.Google ScholarGoogle Scholar
  22. Cathal Gurrin, Alan Smeaton, and Aiden R. Doherty. 2014. LifeLogging: Personal big data. Foundations and Trends in Information Retrieval 8 (Jan. 2014), 1–125. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. Harvey, M. Langheinrich, and G. Ward. 2016. Remembering through lifelogging: A survey of human memory augmentation. Pervasive and Mobile Computing 27 (2016), 14–26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 770–778.Google ScholarGoogle ScholarCross RefCross Ref
  25. Ergina Kavallieratou, Carlos R. Del-Blanco, Carlos Cuevas, and Narciso García. 2018. Retrieving events in life logging. In Working Notes of CLEF 2018.Google ScholarGoogle Scholar
  26. Atsuhiro Kojima, Takeshi Tamura, and Kunio Fukunaga. 2002. Natural language description of human activities from video images based on concept hierarchy of actions. Int. J. Comput. Vis. 50 (2002), 171–184. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. M. L. Lee and A. K. Dey. 2007. Providing good memory cues for people with episodic memory impairment. In ASSETS’07. 131–138. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Y. J. Lee, J. Ghosh, and K. Grauman. 2012. Discovering important people and objects for egocentric video summarization. In CVPR. 1346–1353. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Jie Lin, A. G. del Molino, Qianli Xu, et al.2017. VCI2R at the NTCIR-13 Lifelog-2 lifelog semantic access task. In NTCIR-13. 28–32.Google ScholarGoogle Scholar
  30. Tsung- Yi Lin, Michael Maire, Serge J. Belongie, et al.2014. Microsoft COCO: Common objects in context. In ECCV’14. 740–755.Google ScholarGoogle Scholar
  31. Dongsheng Liu, Shuicheng Yan, Rongrong Ji, Xiansheng Hua, and HongJiang Zhang. 2013. Image retrieval with query-adaptive hashing. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP) 9 (2013), 2:1–2:16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Z. Lu and K. Grauman. 2013. Story-driven summarization for egocentric video. In IEEE CVPR. 2714–2721. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. J. Meyer and S. Boll. 2014. Digital health devices for everyone!Pervasive Computing 13, 2 (2014), 10–13.Google ScholarGoogle Scholar
  34. Saima Noreen, Akira R. O’Connor, and Malcolm D. MacLeod. 2016. Neural correlates of direct and indirect suppression of autobiographical memories. Frontiers in Psychology 7 (2016), No. 379.Google ScholarGoogle Scholar
  35. Yew-Soon Ong and Abhishek Gupta. 2019. AIR5: Five pillars of artificial intelligence research. IEEE Transactions on Emerging Topics in Computational Intelligence 3 (2019), 411–415.Google ScholarGoogle ScholarCross RefCross Ref
  36. Vasileios Papapanagiotou, Christos Diou, and Anastasios Delopoulos. 2015. Improving concept-based image retrieval with training weights computed from tags. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP) 12 (2015), 32:1–32:22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Aiden R. Doherty and Alan Smeaton. 2008. Automatically segmenting lifelog data into events. In 2008 9th International Workshop on Image Analysis for Multimedia Interactive Services. 20–23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Aiden R. Doherty and Alan Smeaton. 2010. Automatically augmenting lifelog events using pervasively generated content from millions of people. Sensors 10 (03 2010), 1423–1446.Google ScholarGoogle Scholar
  39. Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39 (June 2015). Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Gemma Roig, Xavier Boix, Roderick de Nijs, Sebastian Ramos, Kolja Kühnlenz, and Luc J. Van Gool. 2013. Active MAP inference in CRFs for efficient semantic segmentation. In ICCV 2013. 2312–2319. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. B. Safadi, P. Mulhem, G. Quenot, and Chevallet J.-P.2016. LIG-MRIM at NTCIR-12 lifelog semantic access task. In NTCIR-12. 361–365.Google ScholarGoogle Scholar
  42. A. Sellen and S. Whittaker. 2010. Beyond total capture: A constructive critique of lifelogging. Communications of the ACM 53, 5 (2010), 70–77. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Jingkuan Song, Lianli Gao, Feiping Nie, Heng Tao Shen, Yan Yan, and Nicu Sebe. 2016. Optimized graph learning using partial tags and multiple features for image and video annotation. IEEE Transactions on Image Processing 25 (2016), 4999–5011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Jingkuan Song, Yuyu Guo, Lianli Gao, Xuelong Li, Alan Hanjalic, and Heng Tao Shen. 2019. From deterministic to generative: Multimodal stochastic RNNs for video captioning. IEEE Transactions on Neural Networks and Learning Systems 30 (2019), 3047–3058.Google ScholarGoogle ScholarCross RefCross Ref
  45. Jingkuan Song, Hanwang Zhang, Xiangpeng Li, Lianli Gao, Meng Wang, and Richang Hong. 2018. Self-supervised video hashing with hierarchical binary auto-encoder. IEEE Transactions on Image Processing 27 (2018), 3210–3221.Google ScholarGoogle ScholarCross RefCross Ref
  46. Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A. Alemi. 2017. Inception-v4, inception-resnet and the impact of residual connections on learning. In Proc. AAAI. 4278–4284. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Christian Szegedy, Wei Liu, Yangqing J. et al. 2015. Going deeper with convolutions. In CVPR. 1894–1903.Google ScholarGoogle Scholar
  48. Tsun-Hsien Tang, Min-Huan Fu, Hen-Hsen Huang, Kuan-Ta Chen, and Hsin-Hsi Chen. 2018. Visual concept selection with textual knowledge for understanding activities of daily living and life moment retrieval. In Working Notes of CLEF 2018.Google ScholarGoogle Scholar
  49. Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. 2015. Show and tell: A neural image caption generator. In CVPR’15, 3156–3164.Google ScholarGoogle Scholar
  50. Xuanhan Wang, Lianli Gao, Peng Wang, Xiaoshuai Sun, and Xianglong Liu. 2018. Two-stream 3-D convNet fusion for action recognition in videos with arbitrary size and length. IEEE Transactions on Multimedia 20 (2018), 634–644. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Q. Xu, V. Subbaraju, A. G. del Molino, et al.2017. Visualizing personal lifelog data for deeper insights at the NTCIR-13 lifelog-2 task. In NTCIR-13. 33–39.Google ScholarGoogle Scholar
  52. Qianli Xu, Jiayi Zhang, Joanes Grandjean, Cheston Tan, Vigneshwaran Subbaraju, Liyuan Li, Kuan Jen Lee, Po-Jang Hsieh, and Joo-Hwee Lim. 2020. Neural correlates of retrieval-based enhancement of autobiographical memory in older adults. Scientific Reports 10 (2020), Article 1447.Google ScholarGoogle Scholar
  53. S. Yamamoto, T. Nishimura, Y. Akagi, Y. Takimoto, T. Inoue, and H. Toda. 2017. PBG at the NTCIR-13 lifelog-2 LAT, LSAT, and LEST tasks. In NTCIR-13. 12–19.Google ScholarGoogle Scholar
  54. Luke S. Zettlemoyer and Michael Collins. 2005. Learning to map sentences to logical form: Structured classification with probabilistic categorial grammars. In UAI’05. 658–666. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. 2018. Places: A 10 million image database for scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 6 (June 2018), 1452–1464.Google ScholarGoogle ScholarCross RefCross Ref
  56. Liting Zhou, Aaron Duane, Duc-Tien Dang-Nguyen, and Cathal Gurrin. 2017. DCU at the NTCIR-13 lifelog-2 task. In NTCIR-13.Google ScholarGoogle Scholar
  57. L. Zhou, L. Piras, M. Riegler, G. Boato, D.-T. Dang-Nguyen, and C. Gurrin. 2017. Organizer team at imageCLEFlifelog 2017: Baseline approaches for lifelog retrieval and summarization. In Working Notes of CLEF 2017. 1–11.Google ScholarGoogle Scholar
  58. Liting Zhou, Luca Piras, Michael Riegler, Mathias Lux, Duc-Tien Dang-Nguyen, and Cathal Gurrin. 2018. An interactive lifelog retrieval system for activities of daily living understanding. In Working Notes of CLEF 2018.Google ScholarGoogle Scholar

Index Terms

  1. Lifelog Image Retrieval Based on Semantic Relevance Mapping

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Multimedia Computing, Communications, and Applications
        ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 17, Issue 3
        August 2021
        443 pages
        ISSN:1551-6857
        EISSN:1551-6865
        DOI:10.1145/3476118
        Issue’s Table of Contents

        Copyright © 2021 Association for Computing Machinery.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 22 July 2021
        • Accepted: 1 December 2020
        • Revised: 1 November 2020
        • Received: 1 June 2020
        Published in tomm Volume 17, Issue 3

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!