skip to main content
research-article

Knowledge-driven Egocentric Multimodal Activity Recognition

Authors Info & Claims
Published:17 December 2020Publication History
Skip Abstract Section

Abstract

Recognizing activities from egocentric multimodal data collected by wearable cameras and sensors, is gaining interest, as multimodal methods always benefit from the complementarity of different modalities. However, since high-dimensional videos contain rich high-level semantic information while low-dimensional sensor signals describe simple motion patterns of the wearer, the large modality gap between the videos and the sensor signals raises a challenge for fusing the raw data. Moreover, the lack of large-scale egocentric multimodal datasets due to the cost of data collection and annotation processes makes another challenge for employing complex deep learning models. To jointly deal with the above two challenges, we propose a knowledge-driven multimodal activity recognition framework that exploits external knowledge to fuse multimodal data and reduce the dependence on large-scale training samples. Specifically, we design a dual-GCLSTM (Graph Convolutional LSTM) and a multi-layer GCN (Graph Convolutional Network) to collectively model the relations among activities and intermediate objects. The dual-GCLSTM is designed to fuse temporal multimodal features with top-down relation-aware guidance. In addition, we apply a co-attention mechanism to adaptively attend to the features of different modalities at different timesteps. The multi-layer GCN aims to learn relation-aware classifiers of activity categories. Experimental results on three publicly available egocentric multimodal datasets show the effectiveness of the proposed model.

References

  1. Fabien Baradel, Natalia Neverova, Christian Wolf, Julien Mille, and Greg Mori. 2018. Object level visual reasoning in videos. In Proceedings of the 15th European Conference on Computer Vision (ECCV’18). Springer, 105--121. DOI:https://doi.org/10.1007/978-3-030-01261-8_7Google ScholarGoogle ScholarCross RefCross Ref
  2. Edgar A. Bernal, Xitong Yang, Qun Li, Jayant Kumar, Sriganesh Madhvanath, Palghat Ramesh, and Raja Bala. 2017. Deep temporal multimodal fusion for medical procedure monitoring using wearable sensors. IEEE Trans. Multimedia 20, 1 (Jan. 2017), 107--118. DOI:https://doi.org/10.1109/TMM.2017.2726187Google ScholarGoogle Scholar
  3. Alejandro Betancourt, Pietro Morerio, Carlo S. Regazzoni, and Matthias Rauterberg. 2015. The evolution of first person vision methods: A survey. IEEE Trans. Circ. Syst. Vid. Technol. 25, 5 (May 2015), 744--760. DOI:https://doi.org/10.1109/TCSVT.2015.2409731Google ScholarGoogle ScholarCross RefCross Ref
  4. Andreas Bulling, Ulf Blanke, and Bernt Schiele. 2014. A tutorial on human activity recognition using body-worn inertial sensors. ACM Comput. Surv. 46, 3 (Jan. 2014), 33. DOI:https://doi.org/10.1145/2499621Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Fabian Caba Heilbron, Victor Escorcia, Bernard Ghanem, and Juan Carlos Niebles. 2015. Activitynet: A large-scale video benchmark for human activity understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 961--970. DOI:https://doi.org/10.1109/CVPR.2015.7298698Google ScholarGoogle ScholarCross RefCross Ref
  6. Minjie Cai, Kris M. Kitani, and Yoichi Sato. 2016. Understanding hand-object manipulation with grasp types and object attributes. In Proceedings of the Robotics: Science and Systems Conference, Vol. 3.Google ScholarGoogle ScholarCross RefCross Ref
  7. Joao Carreira and Andrew Zisserman. 2017. Quo vadis, action recognition? A new model and the kinetics dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 6299--6308. DOI:https://doi.org/10.1109/CVPR.2017.502Google ScholarGoogle ScholarCross RefCross Ref
  8. Chen Chen, Roozbeh Jafari, and Nasser Kehtarnavaz. 2017. A survey of depth and inertial sensor fusion for human action recognition. Multimedia Tools. Applic. 76, 3 (Feb. 2017), 4405--4425. DOI:https://doi.org/10.1007/s11042-015-3177-1Google ScholarGoogle Scholar
  9. Yuting Chen, Joseph Wang, Yannan Bai, Gregory Castañón, and Venkatesh Saligrama. 2018. Probabilistic semantic retrieval for surveillance videos with activity graphs. IEEE Trans. Multimedia 21, 3 (Mar. 2018), 704--716. DOI:https://doi.org/10.1109/TMM.2018.2865860Google ScholarGoogle Scholar
  10. Maria Cornacchia, Koray Ozcan, Yu Zheng, and Senem Velipasalar. 2017. A survey on activity detection and classification using wearable sensors. IEEE Sensors J. 17, 2 (Jan. 2017), 386--403. DOI:https://doi.org/10.1109/JSEN.2016.2628346Google ScholarGoogle ScholarCross RefCross Ref
  11. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’09). 248--255. DOI:https://doi.org/10.1109/CVPR.2009.5206848Google ScholarGoogle ScholarCross RefCross Ref
  12. Jeffrey Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell. 2015. Long-term recurrent convolutional networks for visual recognition and description. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 2625--2634. DOI:https://doi.org/10.1109/CVPR.2015.7298878Google ScholarGoogle ScholarCross RefCross Ref
  13. Yuan Fang, Kingsley Kuan, Jie Lin, Cheston Tan, and Vijay Chandrasekhar. 2017. Object detection meets knowledge graphs. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI’17). 1661--1667. DOI:https://doi.org/10.24963/ijcai.2017/230Google ScholarGoogle ScholarCross RefCross Ref
  14. Alireza Fathi, Yin Li, and James M. Rehg. 2012. Learning to recognize daily actions using gaze. In Proceedings of the 12th European Conference on Computer Vision (ECCV’12). Springer, 314--327. DOI:https://doi.org/10.1007/978-3-642-33718-5_23Google ScholarGoogle Scholar
  15. Junyu Gao, Tianzhu Zhang, and Changsheng Xu. 2018. Watch, think and attend: End-to-end video classification via dynamic knowledge evolution modeling. In Proceedings of the 26th ACM International Conference on Multimedia (MM’18). ACM, 690--699. DOI:https://doi.org/10.1145/3240508.3240566Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Junyu Gao, Tianzhu Zhang, and Changsheng Xu. 2019. I know the relationships: Zero-shot action recognition via two-stream graph convolutional networks and knowledge graphs. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI’19), Vol. 33. DOI:https://doi.org/10.1609/aaai.v33i01.33018303Google ScholarGoogle ScholarCross RefCross Ref
  17. Weili Guan, Xuemeng Song, Tian Gan, Junyu Lin, Xiaojun Chang, and Liqiang Nie. 2019. Cooperation learning from multiple social networks: Consistent and complementary perspectives. IEEE Trans. Cybern. (2019). DOI:https://doi.org/10.1109/TCYB.2019.2951207Google ScholarGoogle Scholar
  18. Sojeong Ha and Seungjin Choi. 2016. Convolutional neural networks for human activity recognition using multiple accelerometer and gyroscope sensors. In Proceedings of the International Joint Conference on Neural Networks (IJCNN’16). 381--388. DOI:https://doi.org/10.1109/IJCNN.2016.7727224Google ScholarGoogle ScholarCross RefCross Ref
  19. Sojeong Ha, Jeong-Min Yun, and Seungjin Choi. 2015. Multi-modal convolutional neural networks for activity recognition. In Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics (SMC’15). 3017--3022. DOI:https://doi.org/10.1109/SMC.2015.525Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’15). 1026--1034. DOI:https://doi.org/10.1109/ICCV.2015.123Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Comput. 9, 8 (Nov. 1997), 1735--1780. DOI:https://doi.org/10.1162/neco.1997.9.8.1735Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Peng-Ju Hsieh, Yen-Liang Lin, Yu-Hsiu Chen, and Winston Hsu. 2016. Egocentric activity recognition by leveraging multiple mid-level representations. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME’16). 1--6. DOI:https://doi.org/10.1109/ICME.2016.7552937Google ScholarGoogle ScholarCross RefCross Ref
  23. Fairouz Hussein and Massimo Piccardi. 2017. V-JAUNE: A framework for joint action recognition and video summarization. ACM Trans. Multimedia Comput. Commun. Appl. 13, 2 (May 2017), 20. DOI:https://doi.org/10.1145/3063532Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Ahmad Babaeian Jelodar, David Paulius, and Yu Sun. 2019. Long activity video understanding using functional object-oriented network. IEEE Trans. Multimedia 21, 7 (July 2019), 1813--1824. DOI:https://doi.org/10.1109/TMM.2018.2885228Google ScholarGoogle ScholarCross RefCross Ref
  25. Weike Jin, Zhou Zhao, Yimeng Li, Jie Li, Jun Xiao, and Yueting Zhuang. 2019. Video question answering via knowledge-based progressive spatial-temporal attention network. ACM Trans. Multimedia Comput. Commun. Appl. 15, 2s (Aug. 2019), 1--22. DOI:https://doi.org/10.1145/3321505Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Li Fei-Fei. 2014. Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). 1725--1732. DOI:https://doi.org/10.1109/CVPR.2014.223Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Diederik P. Kingma and Jimmy Ba. 2013. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference for Learning Representations (ICLR’15). Retrieved from http://arxiv.org/abs/1412.6980.Google ScholarGoogle Scholar
  28. Thomas N. Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutional networks. In Proceedings of the 5th International Conference on Learning Representations (ICLR’17). Retrieved from https://openreview.net/forum?id=SJU4ayYgl.Google ScholarGoogle Scholar
  29. Shiro Kumano, Kazuhiro Otsuka, Ryo Ishii, and Junji Yamato. 2016. Collective first-person vision for automatic gaze analysis in multiparty conversations. IEEE Trans. Multimedia 19, 1 (Jan. 2016), 107--122. DOI:https://doi.org/10.1109/TMM.2016.2608002Google ScholarGoogle Scholar
  30. Yaguang Li, Rose Yu, Cyrus Shahabi, and Yan Liu. 2018. Diffusion convolutional recurrent neural network: Data-driven traffic forecasting. In Proceedings of the 6th International Conference on Learning Representations (ICLR’18). Retrieved from https://openreview.net/forum?id=SJiHXGWAZ.Google ScholarGoogle Scholar
  31. Meng Liu, Liqiang Nie, Meng Wang, and Baoquan Chen. 2017. Towards micro-video understanding by joint sequential-sparse modeling. In Proceedings of the 25th ACM International Conference on Multimedia (MM’17). 970--978. DOI:https://doi.org/10.1145/3123266.3123341Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Shaopeng Liu, Robert Gao, and Patty Freedson. 2012. Computational methods for estimating energy expenditure in human physical activities. Med. Sci. Sports Exer. 44, 11 (2012), 2138--46. DOI:https://doi.org/10.1249/MSS.0b013e31825e825aGoogle ScholarGoogle ScholarCross RefCross Ref
  33. Jiasen Lu, Jianwei Yang, Dhruv Batra, and Devi Parikh. 2016. Hierarchical question-image co-attention for visual question answering. In Proceedings of the 30th International Conference on Advances in Neural Information Processing Systems (NeurIPS’16). Curran Associates, Inc., 289--297. Retrieved from https://papers.nips.cc/paper/6202-hierarchical-question-image-co-attention-for-visual-question-answering.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Minghuang Ma, Haoqi Fan, and Kris M. Kitani. 2016. Going deeper into first-person activity recognition. In Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 1894--1903. DOI:https://doi.org/10.1109/CVPR.2016.209Google ScholarGoogle Scholar
  35. Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. J. Mach. Learn. Res. 9 (Nov. 2008), 2579--2605. Retrieved from http://www.jmlr.org/papers/v9/vandermaaten08a.html.Google ScholarGoogle Scholar
  36. Kenneth Marino, Ruslan Salakhutdinov, and Abhinav Gupta. 2017. The more you know: Using knowledge graphs for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 20--28. DOI:https://doi.org/10.1109/CVPR.2017.10Google ScholarGoogle ScholarCross RefCross Ref
  37. Pascal Mettes and Cees G. M. Snoek. 2017. Spatial-aware object embeddings for zero-shot localization and classification of actions. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). 4443--4452. DOI:https://doi.org/10.1109/ICCV.2017.476Google ScholarGoogle Scholar
  38. Abduallah Mohamed, Kun Qian, Mohamed Elhoseiny, and Christian Claudel. 2020. Social-STGCNN: A social spatio-temporal graph convolutional neural network for human trajectory prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’20). 14424--14432.Google ScholarGoogle ScholarCross RefCross Ref
  39. Pietro Morerio, Lucio Marcenaro, and Carlo S. Regazzoni. 2013. Hand detection in first person vision. In Proceedings of the 16th International Conference on Information Fusion (FUSION’13). IEEE, 1502--1507.Google ScholarGoogle Scholar
  40. Abdulmajid Murad and Jae-Young Pyun. 2017. Deep recurrent neural networks for human activity recognition. Sensors 17, 11 (Nov. 2017), 2556. DOI:https://doi.org/10.3390/s17112556Google ScholarGoogle ScholarCross RefCross Ref
  41. Katsuyuki Nakamura, Serena Yeung, Alexandre Alahi, and Li Fei-Fei. 2017. Jointly learning energy expenditures and activities using egocentric multimodal signals. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 6817--6826. DOI:https://doi.org/10.1109/CVPR.2017.721Google ScholarGoogle ScholarCross RefCross Ref
  42. Thi-Hoa-Cuc Nguyen, Jean-Christophe Nebel, Francisco Florez-Revuelta, et al. 2016. Recognition of activities of daily living with egocentric vision: A review. Sensors 16, 1 (Jan. 2016), 72. DOI:https://doi.org/10.3390/s16010072Google ScholarGoogle ScholarCross RefCross Ref
  43. Liqiang Nie, Xuemeng Song, and Tat-Seng Chua. 2016. Learning from multiple social networks. Synt. Lect. Inf. Conc., Retr., Serv. 8, 2 (2016), 1--118. DOI:https://doi.org/10.2200/S00714ED1V01Y201603ICR048Google ScholarGoogle ScholarCross RefCross Ref
  44. Liqiang Nie, Xiang Wang, Jianglong Zhang, Xiangnan He, Hanwang Zhang, Richang Hong, and Qi Tian. 2017. Enhancing micro-video understanding by harnessing external sounds. In Proceedings of the 25th ACM International Conference on Multimedia (MM’17). 1192--1200. DOI:https://doi.org/10.1145/3123266.3123313Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Alan V. Oppenheim. 1999. Discrete-time Signal Processing. Pearson Education India.Google ScholarGoogle Scholar
  46. Hamed Pirsiavash and Deva Ramanan. 2012. Detecting activities of daily living in first-person camera views. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’12). 2847--2854. DOI:https://doi.org/10.1109/CVPR.2012.6248010Google ScholarGoogle ScholarCross RefCross Ref
  47. Rafael Possas, Sheila Pinto Caceres, and Fabio Ramos. 2018. Egocentric activity recognition on a budget. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 5967--5976. DOI:https://doi.org/10.1109/CVPR.2018.00625Google ScholarGoogle ScholarCross RefCross Ref
  48. Shengsheng Qian, Tianzhu Zhang, and Changsheng Xu. 2018. Online multimodal multiexpert learning for social event tracking. IEEE Trans. Multimedia 20, 10 (Oct. 2018), 2733--2748. DOI:https://doi.org/10.1109/TMM.2018.2815785Google ScholarGoogle ScholarCross RefCross Ref
  49. Fereshteh Sadeghi, Santosh K. Kumar Divvala, and Ali Farhadi. 2015. VisKE: Visual knowledge extraction and question answering by visual verification of relation phrases. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 1456--1464. DOI:https://doi.org/10.1109/CVPR.2015.7298752Google ScholarGoogle ScholarCross RefCross Ref
  50. Ruslan Salakhutdinov, Antonio Torralba, and Josh Tenenbaum. 2011. Learning to share visual appearance for multiclass object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’11). 1481--1488. DOI:https://doi.org/10.1109/CVPR.2011.5995720Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Youngjoo Seo, Michaël Defferrard, Pierre Vandergheynst, and Xavier Bresson. 2018. Structured sequence modeling with graph convolutional recurrent networks. In Proceedings of the 25th International Conference on Neural Information Processing (ICONIP’18). Springer, 362--373. DOI:https://doi.org/10.1007/978-3-030-04167-0_33Google ScholarGoogle ScholarCross RefCross Ref
  52. Zhijuan Shen, Jun Cheng, Xiping Hu, and Qian Dong. 2019. Emotion recognition based on multi-view body gestures. In Proceedings of the IEEE International Conference on Image Processing (ICIP’19). IEEE, 3317--3321. DOI:https://doi.org/10.1109/ICIP.2019.8803460Google ScholarGoogle ScholarCross RefCross Ref
  53. Karen Simonyan and Andrew Zisserman. 2014. Two-stream convolutional networks for action recognition in videos. In Proceedings of the 28th International Conference on Advances in Neural Information Processing Systems (NeurIPS’14). 568--576. Retrieved from https://papers.nips.cc/paper/5353-two-stream-convolutional-networks-for-action-recognition-in-videos.Google ScholarGoogle Scholar
  54. Mohammad Soltanian and Shahrokh Ghaemmaghami. 2019. Hierarchical concept score postprocessing and concept-wise normalization in CNN-based video event recognition. IEEE Trans. Multimedia 21, 1 (Jan. 2019), 157--172. DOI:https://doi.org/10.1109/TMM.2018.2844101Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Hao Song, Xinxiao Wu, Wennan Yu, and Yunde Jia. 2018. Extracting key segments of videos for event detection by learning from web sources. IEEE Trans. Multimedia 20, 5 (May 2018), 1088--1100. DOI:https://doi.org/10.1109/TMM.2017.2763322Google ScholarGoogle ScholarCross RefCross Ref
  56. Sibo Song, Vijay Chandrasekhar, Bappaditya Mandal, Liyuan Li, Joo-Hwee Lim, Giduthuri Sateesh Babu, Phyo Phyo San, and Ngai-Man Cheung. 2016. Multimodal multi-stream deep learning for egocentric activity recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’16). 378--385. DOI:https://doi.org/10.1109/CVPRW.2016.54Google ScholarGoogle ScholarCross RefCross Ref
  57. Robert Speer and Catherine Havasi. 2013. ConceptNet 5: A large semantic network for relational knowledge. In The People’s Web Meets NLP. Springer, Berlin, 161--176. DOI:https://doi.org/10.1007/978-3-642-35085-6_6Google ScholarGoogle Scholar
  58. Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 2818--2826. DOI:https://doi.org/10.1109/CVPR.2016.308Google ScholarGoogle ScholarCross RefCross Ref
  59. Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2013. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013).Google ScholarGoogle Scholar
  60. Bart Thomee, David A. Shamma, Gerald Friedland, Benjamin Elizalde, Karl Ni, Douglas Poland, Damian Borth, and Li-Jia Li. 2016. YFCC100M: The new data in multimedia research. Commun. ACM 59, 2 (Jan. 2016), 64--73. DOI:https://doi.org/10.1145/2812802Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Heng Wang, Alexander Kläser, Cordelia Schmid, and Cheng-Lin Liu. 2013. Dense trajectories and motion boundary descriptors for action recognition. Int. J. Comput. Vis. 103, 1 (2013), 60--79. DOI:https://doi.org/10.1007/s11263-012-0594-8Google ScholarGoogle ScholarCross RefCross Ref
  62. Jindong Wang, Yiqiang Chen, Shuji Hao, Xiaohui Peng, and Lisha Hu. 2019. Deep learning for sensor-based activity recognition: A survey. Pattern Recog. Lett. 119 (Mar. 2019), 3--11. DOI:https://doi.org/10.1016/j.patrec.2018.02.010Google ScholarGoogle ScholarCross RefCross Ref
  63. Lei Wang, Xu Zhao, Yunfei Si, Liangliang Cao, and Yuncai Liu. 2017. Context-associative hierarchical memory model for human activity recognition and prediction. IEEE Trans. Multimedia 19, 3 (Mar. 2017), 646--659. DOI:https://doi.org/10.1109/TMM.2016.2617079Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Xiaolong Wang, Yufei Ye, and Abhinav Gupta. 2018. Zero-shot recognition via semantic embeddings and knowledge graphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 6857--6866. DOI:https://doi.org/10.1109/CVPR.2018.00717Google ScholarGoogle ScholarCross RefCross Ref
  65. Yinwei Wei, Xiang Wang, Weili Guan, Liqiang Nie, Zhouchen Lin, and Baoquan Chen. 2019. Neural multimodal cooperative learning toward micro-video understanding. IEEE Trans. Image Proc. 29 (2019), 1--14. DOI:https://doi.org/10.1109/TIP.2019.2923608Google ScholarGoogle ScholarCross RefCross Ref
  66. Shuochao Yao, Shaohan Hu, Yiran Zhao, Aston Zhang, and Tarek Abdelzaher. 2017. DeepSense: A unified deep learning framework for time-series mobile sensing data processing. In Proceedings of the 26th International Conference on World Wide Web (WWW’17). International World Wide Web Conferences Steering Committee, 351--360. DOI:https://doi.org/10.1145/3038912.3052577Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Jun Ye, Hao Hu, Guo-Jun Qi, and Kien A. Hua. 2017. A temporal order modeling approach to human action recognition from multimodal sensor data. ACM Trans. Multimedia Comput. Commun. Appl. 13, 2 (Mar. 2017), 14. DOI:https://doi.org/10.1145/3038917Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Xingliang Yuan, Xinyu Wang, Cong Wang, Jian Weng, and Kui Ren. 2016. Enabling secure and fast indexing for privacy-assured healthcare monitoring via compressive sensing. IEEE Trans. Multimedia 18, 10 (Oct. 2016), 2002--2014. DOI:https://doi.org/10.1109/TMM.2016.2602758Google ScholarGoogle ScholarCross RefCross Ref
  69. Tianhao Zhang, Zoe McCarthy, Owen Jow, Dennis Lee, Xi Chen, Ken Goldberg, and Pieter Abbeel. 2018. Deep imitation learning for complex manipulation tasks from virtual reality teleoperation. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA’18). 5628--5635. DOI:https://doi.org/10.1109/ICRA.2018.8461249Google ScholarGoogle ScholarCross RefCross Ref
  70. Yi Zhu, Yang Long, Yu Guan, Shawn Newsam, and Ling Shao. 2018. Towards universal representation for unseen action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 9436--9445. DOI:https://doi.org/10.1109/CVPR.2018.00983Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Knowledge-driven Egocentric Multimodal Activity Recognition

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Multimedia Computing, Communications, and Applications
        ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 16, Issue 4
        November 2020
        372 pages
        ISSN:1551-6857
        EISSN:1551-6865
        DOI:10.1145/3444749
        Issue’s Table of Contents

        Copyright © 2020 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 17 December 2020
        • Accepted: 1 July 2020
        • Revised: 1 June 2020
        • Received: 1 January 2020
        Published in tomm Volume 16, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!