skip to main content
research-article

Learning Video-Text Aligned Representations for Video Captioning

Authors Info & Claims
Published:06 February 2023Publication History
Skip Abstract Section

Abstract

Video captioning requires that the model has the abilities of video understanding, video-text alignment, and text generation. Due to the semantic gap between vision and language, conducting video-text alignment is a crucial step to reduce the semantic gap, which maps the representations from the visual to the language domain. However, the existing methods often overlook this step, so the decoder has to directly take the visual representations as input, which increases the decoder’s workload and limits its ability to generate semantically correct captions. In this paper, we propose a video-text alignment module with a retrieval unit and an alignment unit to learn video-text aligned representations for video captioning. Specifically, we firstly propose a retrieval unit to retrieve sentences as additional input which is used as the semantic anchor between visual scene and language description. Then, we employ an alignment unit with the input of the video and retrieved sentences to conduct the video-text alignment. The representations of two modal inputs are aligned in a shared semantic space. The obtained video-text aligned representations are used to generate semantically correct captions. Moreover, retrieved sentences provide rich semantic concepts which are helpful for generating distinctive captions. Experiments on two public benchmarks, i.e., VATEX and MSR-VTT, demonstrate that our method outperforms state-of-the-art performances by a large margin. The qualitative analysis shows that our method generates correct and distinctive captions.

REFERENCES

  1. [1] Aafaq Nayyer, Akhtar Naveed, Liu Wei, Gilani Syed Zulqarnain, and Mian Ajmal. 2019. Spatio-temporal dynamics and semantic attribute enriched visual encoding for video captioning. In IEEE Conference on Computer Vision and Pattern Recognition. 1248712496.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Bain Max, Nagrani Arsha, Varol Gül, and Zisserman Andrew. 2021. Frozen in time: A joint video and image encoder for end-to-end retrieval. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 17281738.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Banerjee Satanjeev and Lavie Alon. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization. 6572.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] Barbu Andrei, Bridge Alexander, Burchill Zachary, Coroian Dan, Dickinson Sven J., Fidler Sanja, Michaux Aaron, Mussman Sam, Narayanaswamy Siddharth, Salvi Dhaval, Schmidt Lara, Shangguan Jiangnan, Siskind Jeffrey Mark, Waggoner Jarrell W., Wang Song, Wei Jinlian, Yin Yifan, and Zhang Zhiqi. 2012. Video in sentences out. In Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence. 102112.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Chen David L. and Dolan William B.. 2011. Collecting highly parallel data for paraphrase evaluation. In The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference, 19-24 June, 2011, Portland, Oregon, USA, Lin Dekang, Matsumoto Yuji, and Mihalcea Rada (Eds.). The Association for Computer Linguistics, 190200.Google ScholarGoogle Scholar
  6. [6] Chen Shaoxiang and Jiang Yu-Gang. 2019. Motion guided spatial attention for video captioning. In The Thirty-Third AAAI Conference on Artificial Intelligence. 81918198.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. [7] Chen Shizhe, Zhao Yida, Jin Qin, and Wu Qi. 2020. Fine-grained video-text retrieval with hierarchical graph reasoning. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1063510644.Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Chen Xinlei, Fang Hao, Lin Tsung-Yi, Vedantam Ramakrishna, Gupta Saurabh, Dollár Piotr, and Zitnick C. Lawrence. 2015. Microsoft COCO captions: Data collection and evaluation server. CoRR abs/1504.00325 (2015). arxiv:1504.00325.Google ScholarGoogle Scholar
  9. [9] Dai Bo and Lin Dahua. 2017. Contrastive learning for image captioning. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems. 898907.Google ScholarGoogle Scholar
  10. [10] Das Pradipto, Xu Chenliang, Doell Richard F., and Corso Jason J.. 2013. A thousand frames in just a few words: Lingual description of videos through latent topics and sparse object stitching. In 2013 IEEE Conference on Computer Vision and Pattern Recognition. 26342641.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Deng Jia, Dong Wei, Socher Richard, Li Li-Jia, Li Kai, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 248255.Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Deng Jincan, Li Liang, Zhang Beichen, Wang Shuhui, Zha Zhengjun, and Huang Qingming. 2022. Syntax-guided hierarchical attention network for video captioning. IEEE Trans. Circuits Syst. Video Technol. 32, 2 (2022), 880892.Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Dong Jiarong, Gao Ke, Chen Xiaokai, Guo Junbo, Cao Juan, and Zhang Yongdong. 2018. Not all words are equal: Video-specific information loss for video captioning. In British Machine Vision Conference. 58.Google ScholarGoogle Scholar
  14. [14] Dong Jianfeng, Li Xirong, Xu Chaoxi, Ji Shouling, He Yuan, Yang Gang, and Wang Xun. 2019. Dual encoding for zero-example video retrieval. In IEEE Conference on Computer Vision and Pattern Recognition. 93469355.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Fang Han, Xiong Pengfei, Xu Luhui, and Chen Yu. 2021. CLIP2Video: Mastering video-text retrieval via image CLIP. CoRR abs/2106.11097 (2021). arxiv:2106.11097Google ScholarGoogle Scholar
  16. [16] Guo Longteng, Liu Jing, Zhu Xinxin, Yao Peng, Lu Shichen, and Lu Hanqing. 2020. Normalized and geometry-aware self-attention network for image captioning. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1032410333.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Gupta Ankush, Verma Yashaswi, and Jawahar C. V.. 2012. Choosing linguistics over vision to describe images. In Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, Hoffmann Jörg and Selman Bart (Eds.).Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] Guu Kelvin, Lee Kenton, Tung Zora, Pasupat Panupong, and Chang Ming-Wei. 2020. Retrieval augmented language model pre-training. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event(Proceedings of Machine Learning Research, Vol. 119). PMLR, 39293938.Google ScholarGoogle Scholar
  19. [19] Hara Kensho, Kataoka Hirokatsu, and Satoh Yutaka. 2018. Can spatiotemporal 3D CNNs retrace the history of 2D CNNs and ImageNet? In 2018 IEEE Conference on Computer Vision and Pattern Recognition. 65466555.Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] He Chen and Hu Haifeng. 2019. Image captioning with visual-semantic double attention. ACM Trans. Multimedia Comput. Commun. Appl. 15, 1, Article 26 (Jan.2019), 16 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. [21] He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2016. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition. 770778.Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Hou Jingyi, Wu Xinxiao, Zhao Wentian, Luo Jiebo, and Jia Yunde. 2019. Joint syntax representation learning and visual cue translation for video captioning. In 2019 IEEE/CVF International Conference on Computer Vision. 89178926.Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Hu Yaosi, Chen Zhenzhong, Zha Zheng-Jun, and Wu Feng. 2019. Hierarchical global-local temporal modeling for video captioning. In Proceedings of the 27th ACM International Conference on Multimedia. 774783.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] Ji Wanting and Wang Ruili. 2021. A multi-instance multi-label dual learning approach for video captioning. ACM Trans. Multimedia Comput. Commun. Appl. 17, 2s, Article 72 (Jun.2021), 18 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. [25] Jiang Weitao, Wang Weixuan, and Hu Haifeng. 2021. Bi-directional Co-attention network for image captioning. ACM Trans. Multimedia Comput. Commun. Appl. 17, 4, Article 125 (Nov.2021), 20 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. [26] Kay Will, Carreira Joao, Simonyan Karen, Zhang Brian, Hillier Chloe, Vijayanarasimhan Sudheendra, Viola Fabio, Green Tim, Back Trevor, Natsev Paul, et al. 2017. The kinetics human action video dataset. arXiv preprint arXiv:1705.06950 (2017).Google ScholarGoogle Scholar
  27. [27] Kojima Atsuhiro, Tamura Takeshi, and Fukunaga Kunio. 2002. Natural language description of human activities from video images based on concept hierarchy of actions. Int. J. Comput. Vis. 50, 2 (2002), 171184.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] Krishna Ranjay, Hata Kenji, Ren Frederic, Fei-Fei Li, and Niebles Juan Carlos. 2017. Dense-captioning events in videos. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22–29, 2017. IEEE Computer Society, 706715.Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Kuznetsova Polina, Ordonez Vicente, Berg Alexander C., Berg Tamara L., and Choi Yejin. 2012. Collective generation of natural image descriptions. In The 50th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference. 359368.Google ScholarGoogle Scholar
  30. [30] Lewis Mike, Ghazvininejad Marjan, Ghosh Gargi, Aghajanyan Armen, Wang Sida I., and Zettlemoyer Luke. 2020. Pre-training via paraphrasing. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6–12, 2020, virtual, Larochelle Hugo, Ranzato Marc’Aurelio, Hadsell Raia, Balcan Maria-Florina, and Lin Hsuan-Tien (Eds.).Google ScholarGoogle Scholar
  31. [31] Lewis Patrick, Perez Ethan, Piktus Aleksandra, Petroni Fabio, Karpukhin Vladimir, Goyal Naman, Küttler Heinrich, Lewis Mike, Yih Wen-tau, Rocktäschel Tim, Riedel Sebastian, and Kiela Douwe. 2020. Retrieval-augmented generation for knowledge-intensive NLP tasks. In Proceedings of the 34th International Conference on Neural Information Processing Systems (Vancouver, BC, Canada) (NIPS’20). Curran Associates Inc., Red Hook, NY, USA, Article 793, 16 pages.Google ScholarGoogle Scholar
  32. [32] Li Linjie, Chen Yen-Chun, Cheng Yu, Gan Zhe, Yu Licheng, and Liu Jingjing. 2020. HERO: Hierarchical encoder for video+language Omni-representation pre-training. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. 20462065.Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] Lin Chin-Yew. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out. 7481.Google ScholarGoogle Scholar
  34. [34] Liu Daqing, Zha Zheng-Jun, Zhang Hanwang, Zhang Yongdong, and Wu Feng. [n.d.]. Context-aware visual policy network for sequence-level image captioning. In 2018 ACM Multimedia Conference on Multimedia Conference, MM 2018. 14161424.Google ScholarGoogle Scholar
  35. [35] Liu Lixin, Tang Jiajun, Wan Xiaojun, and Guo Zongming. 2019. Generating diverse and descriptive image captions using visual paraphrases. In 2019 IEEE/CVF International Conference on Computer Vision. 42394248.Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Liu Xihui, Li Hongsheng, Shao Jing, Chen Dapeng, and Wang Xiaogang. 2018. Show, tell and discriminate: Image captioning by self-retrieval with partially labeled data. In Computer Vision - ECCV 2018-15th European Conference, Munich, Germany, September 8–14, 2018, Proceedings, Part XV, Vol. 11219. 353369.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. [37] Loshchilov Ilya and Hutter Frank. 2019. Decoupled weight decay regularization. In 7th International Conference on Learning Representations.Google ScholarGoogle Scholar
  38. [38] Luo Huaishao, Ji Lei, Zhong Ming, Chen Yang, Lei Wen, Duan Nan, and Li Tianrui. 2021. CLIP4Clip: An empirical study of CLIP for end to end video clip retrieval. CoRR abs/2104.08860 (2021). arxiv:2104.08860.Google ScholarGoogle Scholar
  39. [39] Luo Ruotian, Price Brian L., Cohen Scott, and Shakhnarovich Gregory. 2018. Discriminability objective for training descriptive captions. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018. IEEE Computer Society, 69646974.Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Miech Antoine, Alayrac Jean-Baptiste, Smaira Lucas, Laptev Ivan, Sivic Josef, and Zisserman Andrew. 2020. End-to-end learning of visual representations from uncurated instructional videos. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 98769886.Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Miech Antoine, Zhukov Dimitri, Alayrac Jean-Baptiste, Tapaswi Makarand, Laptev Ivan, and Sivic Josef. 2019. HowTo100M: Learning a text-video embedding by watching hundred million narrated video clips. In 2019 IEEE/CVF International Conference on Computer Vision. 26302640.Google ScholarGoogle ScholarCross RefCross Ref
  42. [42] Ordonez Vicente, Kulkarni Girish, and Berg Tamara L.. 2011. Im2Text: Describing images using 1 million captioned photographs. In Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems. 11431151.Google ScholarGoogle Scholar
  43. [43] Pan Boxiao, Cai Haoye, Huang De-An, Lee Kuan-Hui, Gaidon Adrien, Adeli Ehsan, and Niebles Juan Carlos. 2020. Spatio-temporal graph for video captioning with knowledge distillation. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1086710876.Google ScholarGoogle ScholarCross RefCross Ref
  44. [44] Papineni Kishore, Roukos Salim, Ward Todd, and Zhu Wei-Jing. 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. 311318.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. [45] Pei Wenjie, Zhang Jiyuan, Wang Xiangrong, Ke Lei, Shen Xiaoyong, and Tai Yu-Wing. 2019. Memory-attended recurrent network for video captioning. In IEEE Conference on Computer Vision and Pattern Recognition. 83478356.Google ScholarGoogle ScholarCross RefCross Ref
  46. [46] Portillo-Quintero Jesús Andrés, Ortiz-Bayliss José Carlos, and Terashima-Marín Hugo. 2021. A straightforward framework for video retrieval using CLIP. CoRR abs/2102.12443 (2021). arxiv:2102.12443.Google ScholarGoogle Scholar
  47. [47] Radford Alec, Kim Jong Wook, Hallacy Chris, Ramesh Aditya, Goh Gabriel, Agarwal Sandhini, Sastry Girish, Askell Amanda, Mishkin Pamela, Clark Jack, Krueger Gretchen, and Sutskever Ilya. 2021. Learning transferable visual models from natural language supervision. CoRR abs/2103.00020 (2021). arxiv:2103.00020.Google ScholarGoogle Scholar
  48. [48] Ryu Hobin, Kang Sunghun, Kang Haeyong, and Yoo Chang D.. 2021. Semantic grouping network for video captioning. In Thirty-Fifth AAAI Conference on Artificial Intelligence. 25142522.Google ScholarGoogle ScholarCross RefCross Ref
  49. [49] Szegedy Christian, Ioffe Sergey, Vanhoucke Vincent, and Alemi Alexander A.. 2017. Inception-v4, inception-ResNet and the impact of residual connections on learning. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence. 42784284.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. [50] Tan Ganchao, Liu Daqing, Wang Meng, and Zha Zheng-Jun. [n.d.]. Learning to discretely compose reasoning module networks for video captioning. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI 2020, Bessiere Christian (Ed.). 745752. Google ScholarGoogle ScholarCross RefCross Ref
  51. [51] Tang Pengjie, Wang Hanli, and Li Qinyu. 2019. Rich visual and language representation with complementary semantics for video captioning. ACM Trans. Multimedia Comput. Commun. Appl. 15, 2, Article 31 (Jun.2019), 23 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. [52] Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N., Kaiser Lukasz, and Polosukhin Illia. 2017. Attention is all you need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems. 59986008.Google ScholarGoogle Scholar
  53. [53] Vedantam R., Zitnick C. L., and Parikh D.. 2015. CIDEr: Consensus-based image description evaluation. In 2015 IEEE Conference on Computer Vision and Pattern Recognition. 45664575.Google ScholarGoogle ScholarCross RefCross Ref
  54. [54] Wang Bairui, Ma Lin, Zhang Wei, Jiang Wenhao, Wang Jingwen, and Liu Wei. 2019. Controllable video captioning with POS sequence guidance based on gated fusion network. In 2019 IEEE/CVF International Conference on Computer Vision. 26412650.Google ScholarGoogle ScholarCross RefCross Ref
  55. [55] Wang Xin, Wu Jiawei, Chen Junkun, Li Lei, Wang Yuan-Fang, and Wang William Yang. 2019. VaTeX: A large-scale, high-quality multilingual dataset for video-and-language research. In 2019 IEEE/CVF International Conference on Computer Vision. 45804590.Google ScholarGoogle ScholarCross RefCross Ref
  56. [56] Wu Yuhuai, Rabe Markus N., Hutchins DeLesley, and Szegedy Christian. 2022. Memorizing transformers. CoRR abs/2203.08913 (2022). arxiv:2203.08913.Google ScholarGoogle Scholar
  57. [57] Wu Yonghui, Schuster Mike, Chen Zhifeng, Le Quoc V., Norouzi Mohammad, Macherey Wolfgang, Krikun Maxim, Cao Yuan, Gao Qin, Macherey Klaus, Klingner Jeff, Shah Apurva, Johnson Melvin, Liu Xiaobing, Kaiser Lukasz, Gouws Stephan, Kato Yoshikiyo, Kudo Taku, Kazawa Hideto, Stevens Keith, Kurian George, Patil Nishant, Wang Wei, Young Cliff, Smith Jason, Riesa Jason, Rudnick Alex, Vinyals Oriol, Corrado Greg, Hughes Macduff, and Dean Jeffrey. 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. CoRR abs/1609.08144 (2016). arxiv:1609.08144.Google ScholarGoogle Scholar
  58. [58] Xu Jun, Mei Tao, Yao Ting, and Rui Yong. 2016. MSR-VTT: A large video description dataset for bridging video and language. In 2016 IEEE Conference on Computer Vision and Pattern Recognition. 52885296.Google ScholarGoogle ScholarCross RefCross Ref
  59. [59] Yang Bang, Zou Yuexian, Liu Fenglin, and Zhang Can. 2021. Non-autoregressive coarse-to-fine video captioning. In Thirty-Fifth AAAI Conference on Artificial Intelligence. 31193127.Google ScholarGoogle ScholarCross RefCross Ref
  60. [60] Yang Xiaoshan and Xu Changsheng. 2019. Image captioning by asking questions. ACM Trans. Multimedia Comput. Commun. Appl. 15, 2s, Article 55 (Jul.2019), 19 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. [61] Yao Li, Torabi Atousa, Cho Kyunghyun, Ballas Nicolas, Pal Christopher J., Larochelle Hugo, and Courville Aaron C.. 2015. Describing videos by exploiting temporal structure. In 2015 IEEE International Conference on Computer Vision. 45074515.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. [62] Yogatama Dani, d’Autume Cyprien de Masson, and Kong Lingpeng. 2021. Adaptive semiparametric language models. Trans. Assoc. Comput. Linguistics 9 (2021), 362373.Google ScholarGoogle ScholarCross RefCross Ref
  63. [63] Zha Zheng-Jun, Liu Daqing, Zhang Hanwang, Zhang Yongdong, and Wu Feng. 2022. Context-aware visual policy network for fine-grained image captioning. IEEE Trans. Pattern Anal. Mach. Intell. 44, 2 (2022), 710722. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. [64] Zhang Junchao and Peng Yuxin. 2019. Object-aware aggregation with bidirectional temporal graph for video captioning. In IEEE Conference on Computer Vision and Pattern Recognition. 83278336.Google ScholarGoogle ScholarCross RefCross Ref
  65. [65] Zhang Ziqi, Qi Zhongang, Yuan Chunfeng, Shan Ying, Li Bing, Deng Ying, and Hu Weiming. 2021. Open-book video captioning with retrieve-copy-generate network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 98379846.Google ScholarGoogle ScholarCross RefCross Ref
  66. [66] Zhang Ziqi, Shi Yaya, Yuan Chunfeng, Li Bing, Wang Peijin, Hu Weiming, and Zha Zheng-Jun. 2020. Object relational graph with teacher-recommended learning for video captioning. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1327513285.Google ScholarGoogle ScholarCross RefCross Ref
  67. [67] Zheng Qi, Wang Chaoyue, and Tao Dacheng. 2020. Syntax-aware action targeting for video captioning. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1309313102.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Learning Video-Text Aligned Representations for Video Captioning

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Multimedia Computing, Communications, and Applications
        ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 19, Issue 2
        March 2023
        540 pages
        ISSN:1551-6857
        EISSN:1551-6865
        DOI:10.1145/3572860
        • Editor:
        • Abdulmotaleb El Saddik
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 6 February 2023
        • Online AM: 7 July 2022
        • Accepted: 23 June 2022
        • Revised: 8 May 2022
        • Received: 2 December 2021
        Published in tomm Volume 19, Issue 2

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      View Full Text

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!