Abstract
Recognizing activities from egocentric multimodal data collected by wearable cameras and sensors, is gaining interest, as multimodal methods always benefit from the complementarity of different modalities. However, since high-dimensional videos contain rich high-level semantic information while low-dimensional sensor signals describe simple motion patterns of the wearer, the large modality gap between the videos and the sensor signals raises a challenge for fusing the raw data. Moreover, the lack of large-scale egocentric multimodal datasets due to the cost of data collection and annotation processes makes another challenge for employing complex deep learning models. To jointly deal with the above two challenges, we propose a knowledge-driven multimodal activity recognition framework that exploits external knowledge to fuse multimodal data and reduce the dependence on large-scale training samples. Specifically, we design a dual-GCLSTM (Graph Convolutional LSTM) and a multi-layer GCN (Graph Convolutional Network) to collectively model the relations among activities and intermediate objects. The dual-GCLSTM is designed to fuse temporal multimodal features with top-down relation-aware guidance. In addition, we apply a co-attention mechanism to adaptively attend to the features of different modalities at different timesteps. The multi-layer GCN aims to learn relation-aware classifiers of activity categories. Experimental results on three publicly available egocentric multimodal datasets show the effectiveness of the proposed model.
- Fabien Baradel, Natalia Neverova, Christian Wolf, Julien Mille, and Greg Mori. 2018. Object level visual reasoning in videos. In Proceedings of the 15th European Conference on Computer Vision (ECCV’18). Springer, 105--121. DOI:https://doi.org/10.1007/978-3-030-01261-8_7Google Scholar
Cross Ref
- Edgar A. Bernal, Xitong Yang, Qun Li, Jayant Kumar, Sriganesh Madhvanath, Palghat Ramesh, and Raja Bala. 2017. Deep temporal multimodal fusion for medical procedure monitoring using wearable sensors. IEEE Trans. Multimedia 20, 1 (Jan. 2017), 107--118. DOI:https://doi.org/10.1109/TMM.2017.2726187Google Scholar
- Alejandro Betancourt, Pietro Morerio, Carlo S. Regazzoni, and Matthias Rauterberg. 2015. The evolution of first person vision methods: A survey. IEEE Trans. Circ. Syst. Vid. Technol. 25, 5 (May 2015), 744--760. DOI:https://doi.org/10.1109/TCSVT.2015.2409731Google Scholar
Cross Ref
- Andreas Bulling, Ulf Blanke, and Bernt Schiele. 2014. A tutorial on human activity recognition using body-worn inertial sensors. ACM Comput. Surv. 46, 3 (Jan. 2014), 33. DOI:https://doi.org/10.1145/2499621Google Scholar
Digital Library
- Fabian Caba Heilbron, Victor Escorcia, Bernard Ghanem, and Juan Carlos Niebles. 2015. Activitynet: A large-scale video benchmark for human activity understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 961--970. DOI:https://doi.org/10.1109/CVPR.2015.7298698Google Scholar
Cross Ref
- Minjie Cai, Kris M. Kitani, and Yoichi Sato. 2016. Understanding hand-object manipulation with grasp types and object attributes. In Proceedings of the Robotics: Science and Systems Conference, Vol. 3.Google Scholar
Cross Ref
- Joao Carreira and Andrew Zisserman. 2017. Quo vadis, action recognition? A new model and the kinetics dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 6299--6308. DOI:https://doi.org/10.1109/CVPR.2017.502Google Scholar
Cross Ref
- Chen Chen, Roozbeh Jafari, and Nasser Kehtarnavaz. 2017. A survey of depth and inertial sensor fusion for human action recognition. Multimedia Tools. Applic. 76, 3 (Feb. 2017), 4405--4425. DOI:https://doi.org/10.1007/s11042-015-3177-1Google Scholar
- Yuting Chen, Joseph Wang, Yannan Bai, Gregory Castañón, and Venkatesh Saligrama. 2018. Probabilistic semantic retrieval for surveillance videos with activity graphs. IEEE Trans. Multimedia 21, 3 (Mar. 2018), 704--716. DOI:https://doi.org/10.1109/TMM.2018.2865860Google Scholar
- Maria Cornacchia, Koray Ozcan, Yu Zheng, and Senem Velipasalar. 2017. A survey on activity detection and classification using wearable sensors. IEEE Sensors J. 17, 2 (Jan. 2017), 386--403. DOI:https://doi.org/10.1109/JSEN.2016.2628346Google Scholar
Cross Ref
- Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’09). 248--255. DOI:https://doi.org/10.1109/CVPR.2009.5206848Google Scholar
Cross Ref
- Jeffrey Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell. 2015. Long-term recurrent convolutional networks for visual recognition and description. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 2625--2634. DOI:https://doi.org/10.1109/CVPR.2015.7298878Google Scholar
Cross Ref
- Yuan Fang, Kingsley Kuan, Jie Lin, Cheston Tan, and Vijay Chandrasekhar. 2017. Object detection meets knowledge graphs. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI’17). 1661--1667. DOI:https://doi.org/10.24963/ijcai.2017/230Google Scholar
Cross Ref
- Alireza Fathi, Yin Li, and James M. Rehg. 2012. Learning to recognize daily actions using gaze. In Proceedings of the 12th European Conference on Computer Vision (ECCV’12). Springer, 314--327. DOI:https://doi.org/10.1007/978-3-642-33718-5_23Google Scholar
- Junyu Gao, Tianzhu Zhang, and Changsheng Xu. 2018. Watch, think and attend: End-to-end video classification via dynamic knowledge evolution modeling. In Proceedings of the 26th ACM International Conference on Multimedia (MM’18). ACM, 690--699. DOI:https://doi.org/10.1145/3240508.3240566Google Scholar
Digital Library
- Junyu Gao, Tianzhu Zhang, and Changsheng Xu. 2019. I know the relationships: Zero-shot action recognition via two-stream graph convolutional networks and knowledge graphs. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI’19), Vol. 33. DOI:https://doi.org/10.1609/aaai.v33i01.33018303Google Scholar
Cross Ref
- Weili Guan, Xuemeng Song, Tian Gan, Junyu Lin, Xiaojun Chang, and Liqiang Nie. 2019. Cooperation learning from multiple social networks: Consistent and complementary perspectives. IEEE Trans. Cybern. (2019). DOI:https://doi.org/10.1109/TCYB.2019.2951207Google Scholar
- Sojeong Ha and Seungjin Choi. 2016. Convolutional neural networks for human activity recognition using multiple accelerometer and gyroscope sensors. In Proceedings of the International Joint Conference on Neural Networks (IJCNN’16). 381--388. DOI:https://doi.org/10.1109/IJCNN.2016.7727224Google Scholar
Cross Ref
- Sojeong Ha, Jeong-Min Yun, and Seungjin Choi. 2015. Multi-modal convolutional neural networks for activity recognition. In Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics (SMC’15). 3017--3022. DOI:https://doi.org/10.1109/SMC.2015.525Google Scholar
Digital Library
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’15). 1026--1034. DOI:https://doi.org/10.1109/ICCV.2015.123Google Scholar
Digital Library
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Comput. 9, 8 (Nov. 1997), 1735--1780. DOI:https://doi.org/10.1162/neco.1997.9.8.1735Google Scholar
Digital Library
- Peng-Ju Hsieh, Yen-Liang Lin, Yu-Hsiu Chen, and Winston Hsu. 2016. Egocentric activity recognition by leveraging multiple mid-level representations. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME’16). 1--6. DOI:https://doi.org/10.1109/ICME.2016.7552937Google Scholar
Cross Ref
- Fairouz Hussein and Massimo Piccardi. 2017. V-JAUNE: A framework for joint action recognition and video summarization. ACM Trans. Multimedia Comput. Commun. Appl. 13, 2 (May 2017), 20. DOI:https://doi.org/10.1145/3063532Google Scholar
Digital Library
- Ahmad Babaeian Jelodar, David Paulius, and Yu Sun. 2019. Long activity video understanding using functional object-oriented network. IEEE Trans. Multimedia 21, 7 (July 2019), 1813--1824. DOI:https://doi.org/10.1109/TMM.2018.2885228Google Scholar
Cross Ref
- Weike Jin, Zhou Zhao, Yimeng Li, Jie Li, Jun Xiao, and Yueting Zhuang. 2019. Video question answering via knowledge-based progressive spatial-temporal attention network. ACM Trans. Multimedia Comput. Commun. Appl. 15, 2s (Aug. 2019), 1--22. DOI:https://doi.org/10.1145/3321505Google Scholar
Digital Library
- Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Li Fei-Fei. 2014. Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). 1725--1732. DOI:https://doi.org/10.1109/CVPR.2014.223Google Scholar
Digital Library
- Diederik P. Kingma and Jimmy Ba. 2013. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference for Learning Representations (ICLR’15). Retrieved from http://arxiv.org/abs/1412.6980.Google Scholar
- Thomas N. Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutional networks. In Proceedings of the 5th International Conference on Learning Representations (ICLR’17). Retrieved from https://openreview.net/forum?id=SJU4ayYgl.Google Scholar
- Shiro Kumano, Kazuhiro Otsuka, Ryo Ishii, and Junji Yamato. 2016. Collective first-person vision for automatic gaze analysis in multiparty conversations. IEEE Trans. Multimedia 19, 1 (Jan. 2016), 107--122. DOI:https://doi.org/10.1109/TMM.2016.2608002Google Scholar
- Yaguang Li, Rose Yu, Cyrus Shahabi, and Yan Liu. 2018. Diffusion convolutional recurrent neural network: Data-driven traffic forecasting. In Proceedings of the 6th International Conference on Learning Representations (ICLR’18). Retrieved from https://openreview.net/forum?id=SJiHXGWAZ.Google Scholar
- Meng Liu, Liqiang Nie, Meng Wang, and Baoquan Chen. 2017. Towards micro-video understanding by joint sequential-sparse modeling. In Proceedings of the 25th ACM International Conference on Multimedia (MM’17). 970--978. DOI:https://doi.org/10.1145/3123266.3123341Google Scholar
Digital Library
- Shaopeng Liu, Robert Gao, and Patty Freedson. 2012. Computational methods for estimating energy expenditure in human physical activities. Med. Sci. Sports Exer. 44, 11 (2012), 2138--46. DOI:https://doi.org/10.1249/MSS.0b013e31825e825aGoogle Scholar
Cross Ref
- Jiasen Lu, Jianwei Yang, Dhruv Batra, and Devi Parikh. 2016. Hierarchical question-image co-attention for visual question answering. In Proceedings of the 30th International Conference on Advances in Neural Information Processing Systems (NeurIPS’16). Curran Associates, Inc., 289--297. Retrieved from https://papers.nips.cc/paper/6202-hierarchical-question-image-co-attention-for-visual-question-answering.Google Scholar
Digital Library
- Minghuang Ma, Haoqi Fan, and Kris M. Kitani. 2016. Going deeper into first-person activity recognition. In Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 1894--1903. DOI:https://doi.org/10.1109/CVPR.2016.209Google Scholar
- Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. J. Mach. Learn. Res. 9 (Nov. 2008), 2579--2605. Retrieved from http://www.jmlr.org/papers/v9/vandermaaten08a.html.Google Scholar
- Kenneth Marino, Ruslan Salakhutdinov, and Abhinav Gupta. 2017. The more you know: Using knowledge graphs for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 20--28. DOI:https://doi.org/10.1109/CVPR.2017.10Google Scholar
Cross Ref
- Pascal Mettes and Cees G. M. Snoek. 2017. Spatial-aware object embeddings for zero-shot localization and classification of actions. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). 4443--4452. DOI:https://doi.org/10.1109/ICCV.2017.476Google Scholar
- Abduallah Mohamed, Kun Qian, Mohamed Elhoseiny, and Christian Claudel. 2020. Social-STGCNN: A social spatio-temporal graph convolutional neural network for human trajectory prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’20). 14424--14432.Google Scholar
Cross Ref
- Pietro Morerio, Lucio Marcenaro, and Carlo S. Regazzoni. 2013. Hand detection in first person vision. In Proceedings of the 16th International Conference on Information Fusion (FUSION’13). IEEE, 1502--1507.Google Scholar
- Abdulmajid Murad and Jae-Young Pyun. 2017. Deep recurrent neural networks for human activity recognition. Sensors 17, 11 (Nov. 2017), 2556. DOI:https://doi.org/10.3390/s17112556Google Scholar
Cross Ref
- Katsuyuki Nakamura, Serena Yeung, Alexandre Alahi, and Li Fei-Fei. 2017. Jointly learning energy expenditures and activities using egocentric multimodal signals. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 6817--6826. DOI:https://doi.org/10.1109/CVPR.2017.721Google Scholar
Cross Ref
- Thi-Hoa-Cuc Nguyen, Jean-Christophe Nebel, Francisco Florez-Revuelta, et al. 2016. Recognition of activities of daily living with egocentric vision: A review. Sensors 16, 1 (Jan. 2016), 72. DOI:https://doi.org/10.3390/s16010072Google Scholar
Cross Ref
- Liqiang Nie, Xuemeng Song, and Tat-Seng Chua. 2016. Learning from multiple social networks. Synt. Lect. Inf. Conc., Retr., Serv. 8, 2 (2016), 1--118. DOI:https://doi.org/10.2200/S00714ED1V01Y201603ICR048Google Scholar
Cross Ref
- Liqiang Nie, Xiang Wang, Jianglong Zhang, Xiangnan He, Hanwang Zhang, Richang Hong, and Qi Tian. 2017. Enhancing micro-video understanding by harnessing external sounds. In Proceedings of the 25th ACM International Conference on Multimedia (MM’17). 1192--1200. DOI:https://doi.org/10.1145/3123266.3123313Google Scholar
Digital Library
- Alan V. Oppenheim. 1999. Discrete-time Signal Processing. Pearson Education India.Google Scholar
- Hamed Pirsiavash and Deva Ramanan. 2012. Detecting activities of daily living in first-person camera views. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’12). 2847--2854. DOI:https://doi.org/10.1109/CVPR.2012.6248010Google Scholar
Cross Ref
- Rafael Possas, Sheila Pinto Caceres, and Fabio Ramos. 2018. Egocentric activity recognition on a budget. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 5967--5976. DOI:https://doi.org/10.1109/CVPR.2018.00625Google Scholar
Cross Ref
- Shengsheng Qian, Tianzhu Zhang, and Changsheng Xu. 2018. Online multimodal multiexpert learning for social event tracking. IEEE Trans. Multimedia 20, 10 (Oct. 2018), 2733--2748. DOI:https://doi.org/10.1109/TMM.2018.2815785Google Scholar
Cross Ref
- Fereshteh Sadeghi, Santosh K. Kumar Divvala, and Ali Farhadi. 2015. VisKE: Visual knowledge extraction and question answering by visual verification of relation phrases. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 1456--1464. DOI:https://doi.org/10.1109/CVPR.2015.7298752Google Scholar
Cross Ref
- Ruslan Salakhutdinov, Antonio Torralba, and Josh Tenenbaum. 2011. Learning to share visual appearance for multiclass object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’11). 1481--1488. DOI:https://doi.org/10.1109/CVPR.2011.5995720Google Scholar
Digital Library
- Youngjoo Seo, Michaël Defferrard, Pierre Vandergheynst, and Xavier Bresson. 2018. Structured sequence modeling with graph convolutional recurrent networks. In Proceedings of the 25th International Conference on Neural Information Processing (ICONIP’18). Springer, 362--373. DOI:https://doi.org/10.1007/978-3-030-04167-0_33Google Scholar
Cross Ref
- Zhijuan Shen, Jun Cheng, Xiping Hu, and Qian Dong. 2019. Emotion recognition based on multi-view body gestures. In Proceedings of the IEEE International Conference on Image Processing (ICIP’19). IEEE, 3317--3321. DOI:https://doi.org/10.1109/ICIP.2019.8803460Google Scholar
Cross Ref
- Karen Simonyan and Andrew Zisserman. 2014. Two-stream convolutional networks for action recognition in videos. In Proceedings of the 28th International Conference on Advances in Neural Information Processing Systems (NeurIPS’14). 568--576. Retrieved from https://papers.nips.cc/paper/5353-two-stream-convolutional-networks-for-action-recognition-in-videos.Google Scholar
- Mohammad Soltanian and Shahrokh Ghaemmaghami. 2019. Hierarchical concept score postprocessing and concept-wise normalization in CNN-based video event recognition. IEEE Trans. Multimedia 21, 1 (Jan. 2019), 157--172. DOI:https://doi.org/10.1109/TMM.2018.2844101Google Scholar
Digital Library
- Hao Song, Xinxiao Wu, Wennan Yu, and Yunde Jia. 2018. Extracting key segments of videos for event detection by learning from web sources. IEEE Trans. Multimedia 20, 5 (May 2018), 1088--1100. DOI:https://doi.org/10.1109/TMM.2017.2763322Google Scholar
Cross Ref
- Sibo Song, Vijay Chandrasekhar, Bappaditya Mandal, Liyuan Li, Joo-Hwee Lim, Giduthuri Sateesh Babu, Phyo Phyo San, and Ngai-Man Cheung. 2016. Multimodal multi-stream deep learning for egocentric activity recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’16). 378--385. DOI:https://doi.org/10.1109/CVPRW.2016.54Google Scholar
Cross Ref
- Robert Speer and Catherine Havasi. 2013. ConceptNet 5: A large semantic network for relational knowledge. In The People’s Web Meets NLP. Springer, Berlin, 161--176. DOI:https://doi.org/10.1007/978-3-642-35085-6_6Google Scholar
- Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 2818--2826. DOI:https://doi.org/10.1109/CVPR.2016.308Google Scholar
Cross Ref
- Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2013. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013).Google Scholar
- Bart Thomee, David A. Shamma, Gerald Friedland, Benjamin Elizalde, Karl Ni, Douglas Poland, Damian Borth, and Li-Jia Li. 2016. YFCC100M: The new data in multimedia research. Commun. ACM 59, 2 (Jan. 2016), 64--73. DOI:https://doi.org/10.1145/2812802Google Scholar
Digital Library
- Heng Wang, Alexander Kläser, Cordelia Schmid, and Cheng-Lin Liu. 2013. Dense trajectories and motion boundary descriptors for action recognition. Int. J. Comput. Vis. 103, 1 (2013), 60--79. DOI:https://doi.org/10.1007/s11263-012-0594-8Google Scholar
Cross Ref
- Jindong Wang, Yiqiang Chen, Shuji Hao, Xiaohui Peng, and Lisha Hu. 2019. Deep learning for sensor-based activity recognition: A survey. Pattern Recog. Lett. 119 (Mar. 2019), 3--11. DOI:https://doi.org/10.1016/j.patrec.2018.02.010Google Scholar
Cross Ref
- Lei Wang, Xu Zhao, Yunfei Si, Liangliang Cao, and Yuncai Liu. 2017. Context-associative hierarchical memory model for human activity recognition and prediction. IEEE Trans. Multimedia 19, 3 (Mar. 2017), 646--659. DOI:https://doi.org/10.1109/TMM.2016.2617079Google Scholar
Digital Library
- Xiaolong Wang, Yufei Ye, and Abhinav Gupta. 2018. Zero-shot recognition via semantic embeddings and knowledge graphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 6857--6866. DOI:https://doi.org/10.1109/CVPR.2018.00717Google Scholar
Cross Ref
- Yinwei Wei, Xiang Wang, Weili Guan, Liqiang Nie, Zhouchen Lin, and Baoquan Chen. 2019. Neural multimodal cooperative learning toward micro-video understanding. IEEE Trans. Image Proc. 29 (2019), 1--14. DOI:https://doi.org/10.1109/TIP.2019.2923608Google Scholar
Cross Ref
- Shuochao Yao, Shaohan Hu, Yiran Zhao, Aston Zhang, and Tarek Abdelzaher. 2017. DeepSense: A unified deep learning framework for time-series mobile sensing data processing. In Proceedings of the 26th International Conference on World Wide Web (WWW’17). International World Wide Web Conferences Steering Committee, 351--360. DOI:https://doi.org/10.1145/3038912.3052577Google Scholar
Digital Library
- Jun Ye, Hao Hu, Guo-Jun Qi, and Kien A. Hua. 2017. A temporal order modeling approach to human action recognition from multimodal sensor data. ACM Trans. Multimedia Comput. Commun. Appl. 13, 2 (Mar. 2017), 14. DOI:https://doi.org/10.1145/3038917Google Scholar
Digital Library
- Xingliang Yuan, Xinyu Wang, Cong Wang, Jian Weng, and Kui Ren. 2016. Enabling secure and fast indexing for privacy-assured healthcare monitoring via compressive sensing. IEEE Trans. Multimedia 18, 10 (Oct. 2016), 2002--2014. DOI:https://doi.org/10.1109/TMM.2016.2602758Google Scholar
Cross Ref
- Tianhao Zhang, Zoe McCarthy, Owen Jow, Dennis Lee, Xi Chen, Ken Goldberg, and Pieter Abbeel. 2018. Deep imitation learning for complex manipulation tasks from virtual reality teleoperation. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA’18). 5628--5635. DOI:https://doi.org/10.1109/ICRA.2018.8461249Google Scholar
Cross Ref
- Yi Zhu, Yang Long, Yu Guan, Shawn Newsam, and Ling Shao. 2018. Towards universal representation for unseen action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 9436--9445. DOI:https://doi.org/10.1109/CVPR.2018.00983Google Scholar
Cross Ref
Index Terms
Knowledge-driven Egocentric Multimodal Activity Recognition
Recommendations
Few-shot Egocentric Multimodal Activity Recognition
MMAsia '21: ACM Multimedia AsiaActivity recognition based on egocentric multimodal data collected by wearable devices has become increasingly popular recently. However, conventional activity recognition methods face the dilemma of the lack of large-scale labeled egocentric multimodal ...
Recognizing Camera Wearer from Hand Gestures in Egocentric Videos: https://egocentricbiometric.github.io/
MM '20: Proceedings of the 28th ACM International Conference on MultimediaWearable egocentric cameras are typically harnessed to a wearer's head, giving them the unique advantage of capturing their points of view. Hoshen and Peleg have shown that egocentric cameras indirectly capture the wearer's gait, which can be used to ...
Activity recognition with hand-worn magnetic sensors
Activity recognition is a key technology for realizing ambient assisted living applications such as care of the elderly and home automation. This paper proposes a new activity recognition method that employs hand-worn magnetic sensors to recognize a ...






Comments