Abstract
Lifelog analytics is an emerging research area with technologies embracing the latest advances in machine learning, wearable computing, and data analytics. However, state-of-the-art technologies are still inadequate to distill voluminous multimodal lifelog data into high quality insights. In this article, we propose a novel semantic relevance mapping (SRM) method to tackle the problem of lifelog information access. We formulate lifelog image retrieval as a series of mapping processes where a semantic gap exists for relating basic semantic attributes with high-level query topics. The SRM serves both as a formalism to construct a trainable model to bridge the semantic gap and an algorithm to implement the training process on real-world lifelog data. Based on the SRM, we propose a computational framework of lifelog analytics to support various applications of lifelog information access, such as image retrieval, summarization, and insight visualization. Systematic evaluations are performed on three challenging benchmarking tasks to show the effectiveness of our method.
- F. B. Abdallah, G. Feki, A. B. Ammar, and C. B. Amar. 2018. A new model driven architecture for deep learning-based multimodal lifelog retrieval. In ICCE Computer Graphics, Visualization and Computer Vision. 1–10.Google Scholar
- Fatma Ben Abdallah, Ghada Feki, Mohamed Ezzarka, et al.2018. Regim lab team at ImageCLEF lifelog moment retrieval task 2018. In Working Notes of CLEF 2018.Google Scholar
- Peter Anderson, Xiaodong He, Chris Buehler, et al.2018. Bottom-up and top-down attention for image captioning and visual question answering. In CVPR. 6077–6086.Google Scholar
- I. Androutsopoulos, G. D. Ritchie, and Peter Thanisch. 1995. Natural language interfaces to databases—An introduction. Natural Language Engineering 1 (March 1995), 29–81.Google Scholar
- Jonathan Berant, Andrew Chou, Roy Frostig, and Percy S. Liang. 2013. Semantic parsing on freebase from question-answer pairs. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 1533–1544.Google Scholar
- M. Bolaños, M. Dimiccoli, and P. Radeva. 2017. Toward storytelling from visual lifelogging: An overview. IEEE Transactions on Human-Machine Systems 47 (2017), 77–90.Google Scholar
- Marc Bolaños, Ricard Mestre, Estefanía Talavera, et al.2015. Visual summary of egocentric photostreams by representative keyframes. In IEEE 1st International Workshop on Wearable and Ego-Vision Systems for Augmented Experience (WEsAX’15). ICME. 1–6.Google Scholar
- Yuri Boykov and Vladimir Kolmogorov. 2004. An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Transactions on Pattern Analysis and Machine Intelligence 26, 9 (2004), 1124–1137. Google Scholar
Digital Library
- Shih-Fu Chang. 2013. How far we’ve come: Impact of 20 years of multimedia information retrieval. ACM Transactions on Multimedia Computing, Communications and Applications 9 (2013), 42:1–42:4. Google Scholar
Digital Library
- Yi Chen and Gareth J. F. Jones. 2010. Augmenting human memory using personal lifelogs. In ACM AH’10. Article 24, 9 pages. Google Scholar
Digital Library
- E. K. Choe, B. Lee, and M. C. Schraefel. 2015. Characterizing visualization insights from quantified selfers’ personal data presentations. IEEE Computer Graphics and Applications 35, 4 (2015), 28–37.Google Scholar
Digital Library
- D.-T. Dang-Nguyen, L. Piras, M. Riegler, G. Boato, L. Zhou, and C. Gurrin. 2017. Overview of ImageCLEFlifelog 2017: Lifelog retrieval and summarization. In Working Notes of CLEF 2017. 1–14.Google Scholar
- Duc-Tien Dang-Nguyen, Luca Piras, Michael Riegler, Liting Zhou, Mathias Lux, and Cathal Gurrin. 2018. Overview of imagecleflifelog 2018: Daily living understanding and lifelog moment retrieval. In Working Notes of CLEF 2018.Google Scholar
- A. G. del Molino, M. Bappaditya, J. Lin, J.-H. Lim, S. Vigneshwaran, and V. Chandrasekhar. 2017. VC-I2R at ImageCLEF2017: Ensemble of deep learned features for lifelog video summarization. In Working Notes of CLEF 2017. 1–12.Google Scholar
- A. G. del Molino, Joo-Hwee Lim, and Ah-Hwee Tan. 2018. Predicting visual context for unsupervised event segmentation in continuous photo-streams. In Proceedings of the 26th ACM International Conference on Multimedia (MM’18). 10–17. Google Scholar
Digital Library
- J. Deng, W. Dong, R. Socher, L. Li, et al. 2009. ImageNet: A large-scale hierarchical image database. In CVPR. 248–255.Google Scholar
- M. Dimiccolia, M. Bolanos, E. Talaveraa, M. Aghaeia, S. G. Nikolovd, and P. Radeva. 2017. SR-Clustering: Semantic regularized clustering for egocentric photo streams segmentation. Computer Vision and Image Understanding 155 (2017), 55–69.Google Scholar
Cross Ref
- Thanh-Toan Do, Tuan Hoang, Dang-Khoa Le Tan, and Ngai-Man Cheung. 2019. From selective deep convolutional features to compact binary representations for image retrieval. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP) 15 (2019), 43:1–27:22. Google Scholar
Digital Library
- Mihai Dogariu and Bogdan Ionescu. 2017. A textual filtering of hog-based hierarchical clustering of lifelog data. In Working Notes of CLEF 2017.Google Scholar
- A. Duane, R. Gupta, L. Zhou, and C. Gurrin. 2016. Visual insights from personal lifelogs. In Proceedings of the 12th NTCIR Conference on Evaluation of Information Access Technologies (NTCIR-12). 386–389.Google Scholar
- C. Gurrin, H. Joho, F. Hopfgartner, et al.2017. Overview of NTCIR-13 Lifelog-2 task. In The 13th NTCIR Conference (NTCIR-13). 6–11.Google Scholar
- Cathal Gurrin, Alan Smeaton, and Aiden R. Doherty. 2014. LifeLogging: Personal big data. Foundations and Trends in Information Retrieval 8 (Jan. 2014), 1–125. Google Scholar
Digital Library
- M. Harvey, M. Langheinrich, and G. Ward. 2016. Remembering through lifelogging: A survey of human memory augmentation. Pervasive and Mobile Computing 27 (2016), 14–26. Google Scholar
Digital Library
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 770–778.Google Scholar
Cross Ref
- Ergina Kavallieratou, Carlos R. Del-Blanco, Carlos Cuevas, and Narciso García. 2018. Retrieving events in life logging. In Working Notes of CLEF 2018.Google Scholar
- Atsuhiro Kojima, Takeshi Tamura, and Kunio Fukunaga. 2002. Natural language description of human activities from video images based on concept hierarchy of actions. Int. J. Comput. Vis. 50 (2002), 171–184. Google Scholar
Digital Library
- M. L. Lee and A. K. Dey. 2007. Providing good memory cues for people with episodic memory impairment. In ASSETS’07. 131–138. Google Scholar
Digital Library
- Y. J. Lee, J. Ghosh, and K. Grauman. 2012. Discovering important people and objects for egocentric video summarization. In CVPR. 1346–1353. Google Scholar
Digital Library
- Jie Lin, A. G. del Molino, Qianli Xu, et al.2017. VCI2R at the NTCIR-13 Lifelog-2 lifelog semantic access task. In NTCIR-13. 28–32.Google Scholar
- Tsung- Yi Lin, Michael Maire, Serge J. Belongie, et al.2014. Microsoft COCO: Common objects in context. In ECCV’14. 740–755.Google Scholar
- Dongsheng Liu, Shuicheng Yan, Rongrong Ji, Xiansheng Hua, and HongJiang Zhang. 2013. Image retrieval with query-adaptive hashing. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP) 9 (2013), 2:1–2:16. Google Scholar
Digital Library
- Z. Lu and K. Grauman. 2013. Story-driven summarization for egocentric video. In IEEE CVPR. 2714–2721. Google Scholar
Digital Library
- J. Meyer and S. Boll. 2014. Digital health devices for everyone!Pervasive Computing 13, 2 (2014), 10–13.Google Scholar
- Saima Noreen, Akira R. O’Connor, and Malcolm D. MacLeod. 2016. Neural correlates of direct and indirect suppression of autobiographical memories. Frontiers in Psychology 7 (2016), No. 379.Google Scholar
- Yew-Soon Ong and Abhishek Gupta. 2019. AIR5: Five pillars of artificial intelligence research. IEEE Transactions on Emerging Topics in Computational Intelligence 3 (2019), 411–415.Google Scholar
Cross Ref
- Vasileios Papapanagiotou, Christos Diou, and Anastasios Delopoulos. 2015. Improving concept-based image retrieval with training weights computed from tags. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP) 12 (2015), 32:1–32:22. Google Scholar
Digital Library
- Aiden R. Doherty and Alan Smeaton. 2008. Automatically segmenting lifelog data into events. In 2008 9th International Workshop on Image Analysis for Multimedia Interactive Services. 20–23. Google Scholar
Digital Library
- Aiden R. Doherty and Alan Smeaton. 2010. Automatically augmenting lifelog events using pervasively generated content from millions of people. Sensors 10 (03 2010), 1423–1446.Google Scholar
- Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39 (June 2015). Google Scholar
Digital Library
- Gemma Roig, Xavier Boix, Roderick de Nijs, Sebastian Ramos, Kolja Kühnlenz, and Luc J. Van Gool. 2013. Active MAP inference in CRFs for efficient semantic segmentation. In ICCV 2013. 2312–2319. Google Scholar
Digital Library
- B. Safadi, P. Mulhem, G. Quenot, and Chevallet J.-P.2016. LIG-MRIM at NTCIR-12 lifelog semantic access task. In NTCIR-12. 361–365.Google Scholar
- A. Sellen and S. Whittaker. 2010. Beyond total capture: A constructive critique of lifelogging. Communications of the ACM 53, 5 (2010), 70–77. Google Scholar
Digital Library
- Jingkuan Song, Lianli Gao, Feiping Nie, Heng Tao Shen, Yan Yan, and Nicu Sebe. 2016. Optimized graph learning using partial tags and multiple features for image and video annotation. IEEE Transactions on Image Processing 25 (2016), 4999–5011. Google Scholar
Digital Library
- Jingkuan Song, Yuyu Guo, Lianli Gao, Xuelong Li, Alan Hanjalic, and Heng Tao Shen. 2019. From deterministic to generative: Multimodal stochastic RNNs for video captioning. IEEE Transactions on Neural Networks and Learning Systems 30 (2019), 3047–3058.Google Scholar
Cross Ref
- Jingkuan Song, Hanwang Zhang, Xiangpeng Li, Lianli Gao, Meng Wang, and Richang Hong. 2018. Self-supervised video hashing with hierarchical binary auto-encoder. IEEE Transactions on Image Processing 27 (2018), 3210–3221.Google Scholar
Cross Ref
- Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A. Alemi. 2017. Inception-v4, inception-resnet and the impact of residual connections on learning. In Proc. AAAI. 4278–4284. Google Scholar
Digital Library
- Christian Szegedy, Wei Liu, Yangqing J. et al. 2015. Going deeper with convolutions. In CVPR. 1894–1903.Google Scholar
- Tsun-Hsien Tang, Min-Huan Fu, Hen-Hsen Huang, Kuan-Ta Chen, and Hsin-Hsi Chen. 2018. Visual concept selection with textual knowledge for understanding activities of daily living and life moment retrieval. In Working Notes of CLEF 2018.Google Scholar
- Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. 2015. Show and tell: A neural image caption generator. In CVPR’15, 3156–3164.Google Scholar
- Xuanhan Wang, Lianli Gao, Peng Wang, Xiaoshuai Sun, and Xianglong Liu. 2018. Two-stream 3-D convNet fusion for action recognition in videos with arbitrary size and length. IEEE Transactions on Multimedia 20 (2018), 634–644. Google Scholar
Digital Library
- Q. Xu, V. Subbaraju, A. G. del Molino, et al.2017. Visualizing personal lifelog data for deeper insights at the NTCIR-13 lifelog-2 task. In NTCIR-13. 33–39.Google Scholar
- Qianli Xu, Jiayi Zhang, Joanes Grandjean, Cheston Tan, Vigneshwaran Subbaraju, Liyuan Li, Kuan Jen Lee, Po-Jang Hsieh, and Joo-Hwee Lim. 2020. Neural correlates of retrieval-based enhancement of autobiographical memory in older adults. Scientific Reports 10 (2020), Article 1447.Google Scholar
- S. Yamamoto, T. Nishimura, Y. Akagi, Y. Takimoto, T. Inoue, and H. Toda. 2017. PBG at the NTCIR-13 lifelog-2 LAT, LSAT, and LEST tasks. In NTCIR-13. 12–19.Google Scholar
- Luke S. Zettlemoyer and Michael Collins. 2005. Learning to map sentences to logical form: Structured classification with probabilistic categorial grammars. In UAI’05. 658–666. Google Scholar
Digital Library
- Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. 2018. Places: A 10 million image database for scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 6 (June 2018), 1452–1464.Google Scholar
Cross Ref
- Liting Zhou, Aaron Duane, Duc-Tien Dang-Nguyen, and Cathal Gurrin. 2017. DCU at the NTCIR-13 lifelog-2 task. In NTCIR-13.Google Scholar
- L. Zhou, L. Piras, M. Riegler, G. Boato, D.-T. Dang-Nguyen, and C. Gurrin. 2017. Organizer team at imageCLEFlifelog 2017: Baseline approaches for lifelog retrieval and summarization. In Working Notes of CLEF 2017. 1–11.Google Scholar
- Liting Zhou, Luca Piras, Michael Riegler, Mathias Lux, Duc-Tien Dang-Nguyen, and Cathal Gurrin. 2018. An interactive lifelog retrieval system for activities of daily living understanding. In Working Notes of CLEF 2018.Google Scholar
Index Terms
Lifelog Image Retrieval Based on Semantic Relevance Mapping
Recommendations
Transfer Learning for Improving Lifelog Image Retrieval
Computer Analysis of Images and PatternsAbstractWith lifelogging devices; such as wearable camera, smart watches, audio recorder or standalone smartphone applications; capturing daily moments becomes easier. In recent years, many workshops and panels have emerged and proposed benchmarks to face ...
Incorporating Semantic Knowledge for Visual Lifelog Activity Recognition
ICMR '20: Proceedings of the 2020 International Conference on Multimedia RetrievalThe advance in wearable technology has made lifelogging more feasible and more popular. Visual lifelogs collected by wearable cameras capture every single detail of individual's life experience, offering a promising data source for deeper lifestyle ...
Impact of Blind Image Quality Assessment on the Retrieval of Lifelog Images
IMuR '22: Proceedings of the 2nd International Workshop on Interactive Multimedia RetrievalThe use of personal lifelogs can be beneficial to improve the quality of our life, as they can serve as tools for memory augmentation or for providing support to people with memory issues. In visual lifelogs, data are captured by cameras in the form of ...






Comments