skip to main content
research-article

Smart Director: An Event-Driven Directing System for Live Broadcasting

Published:12 November 2021Publication History
Skip Abstract Section

Abstract

Live video broadcasting normally requires a multitude of skills and expertise with domain knowledge to enable multi-camera productions. As the number of cameras keeps increasing, directing a live sports broadcast has now become more complicated and challenging than ever before. The broadcast directors need to be much more concentrated, responsive, and knowledgeable, during the production. To relieve the directors from their intensive efforts, we develop an innovative automated sports broadcast directing system, called Smart Director, which aims at mimicking the typical human-in-the-loop broadcasting process to automatically create near-professional broadcasting programs in real-time by using a set of advanced multi-view video analysis algorithms. Inspired by the so-called “three-event” construction of sports broadcast [14], we build our system with an event-driven pipeline consisting of three consecutive novel components: (1) the Multi-View Event Localization to detect events by modeling multi-view correlations, (2) the Multi-View Highlight Detection to rank camera views by the visual importance for view selection, and (3) the Auto-Broadcasting Scheduler to control the production of broadcasting videos. To our best knowledge, our system is the first end-to-end automated directing system for multi-camera sports broadcasting, completely driven by the semantic understanding of sports events. It is also the first system to solve the novel problem of multi-view joint event detection by cross-view relation modeling. We conduct both objective and subjective evaluations on a real-world multi-camera soccer dataset, which demonstrate the quality of our auto-generated videos is comparable to that of the human-directed videos. Thanks to its faster response, our system is able to capture more fast-passing and short-duration events which are usually missed by human directors.

REFERENCES

  1. [1] Barnfield Andrew. 2013. Soccer, broadcasting, and narrative: On televising a live soccer match. Communication & Sport 1, 4 (2013), 326341.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Buch Shyamal, Escorcia Victor, Shen Chuanqi, Ghanem Bernard, and Niebles Juan Carlos. 2017. Sst: Single-stream temporal action proposals. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 29112920.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Cai Qi, Pan Yingwei, Ngo Chong-Wah, Tian Xinmei, Duan Lingyu, and Yao Ting. 2019. Exploring object relation in mean teacher for cross-domain detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1145711466.Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Chen Christine, Wang Oliver, Heinzle Simon, Carr Peter, Smolic Aljoscha, and Gross Markus. 2013. Computational sports broadcasting: Automated director assistance for live sports. In 2013 IEEE International Conference on Multimedia and Expo (ICME’13). IEEE, 16.Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Chen Jianhui and Little James J.. 2019. Sports camera calibration via synthetic data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 00.Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Chen Jianhui, Lu Keyu, Tian Sijia, and Little Jim. 2019. Learning sports camera selection from internet videos. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV’19). IEEE, 16821691.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Chen Jianhui, Meng Lili, and Little James J.. 2018. Camera selection for broadcasting soccer games. In 2018 IEEE Winter Conference on Applications of Computer Vision (WACV’18). IEEE, 427435.Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Chen Yang, Pan Yingwei, Yao Ting, Tian Xinmei, and Mei Tao. 2019. Mocycle-Gan: Unpaired video-to-video translation. In Proceedings of the 27th ACM International Conference on Multimedia. 647655. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Choi Kyu-Hyoung, Lee Sang-Wook, and Seo Yong-Duek. 2009. Automatic broadcast video generation for ball sports from multiple views. In Proceedings of the Korean Society of Broadcast Engineers Conference. The Korean Institute of Broadcast and Media Engineers, 193198.Google ScholarGoogle Scholar
  10. [10] Daniyal Fahad and Cavallaro Andrea. 2011. Multi-camera scheduling for video production. In 2011 Conference for Visual Media Production. IEEE, 1120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Deng Jiajun, Pan Yingwei, Yao Ting, Zhou Wengang, Li Houqiang, and Mei Tao. 2019. Relation distillation networks for video object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 70237032.Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Gaidon Adrien, Harchaoui Zaid, and Schmid Cordelia. 2013. Temporal localization of actions with actoms. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 11 (2013), 27822795. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] Gao Jiyang, Chen Kan, and Nevatia Ram. 2018. CTAP: Complementary temporal action proposal generation. In Proceedings of the European Conference on Computer Vision (ECCV’18). 6883.Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Goldlust John. 2018. Playing for Keeps: Sport, the Media and Society. Hybrid Publishers.Google ScholarGoogle Scholar
  15. [15] Isola Phillip, Zhu Jun-Yan, Zhou Tinghui, and Efros Alexei A.. 2017. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 11251134.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Javed Ali, Bajwa Khalid Bashir, Malik Hafiz, and Irtaza Aun. 2016. An efficient framework for automatic highlights generation from sports videos. IEEE Signal Processing Letters 23, 7 (2016), 954958.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Kim Hoseong, Mei Tao, Byun Hyeran, and Yao Ting. 2018. Exploiting web images for video highlight detection with triplet deep ranking. IEEE Transactions on Multimedia 20, 9 (2018), 24152426. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] Kong Yu, Tao Zhiqiang, and Fu Yun. 2017. Deep sequential context networks for action prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 14731481.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Leake Mackenzie, Davis Abe, Truong Anh, and Agrawala Maneesh. 2017. Computational video editing for dialogue-driven scenes.ACM Transactions on Graphics 36, 4 (2017), 130–1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Lefevre Florent, Bombardier Vincent, Charpentier Patrick, Krommenacker Nicolas, and Petat Bertrand. 2018. Automatic camera selection in the context of basketball game. In International Conference on Image and Signal Processing. Springer, 7279.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Li Chunyang, Jia Caiyan, Chen Zhineng, Gu Xiaoyan, and Bao Hongyun. 2019. psDirector: An automatic director for watching view generation from panoramic soccer video. In International Conference on Multimedia Modeling. Springer, 218230.Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Li Yehao, Pan Yingwei, Yao Ting, Chao Hongyang, Rui Yong, and Mei Tao. 2019. Learning click-based deep structure-preserving embeddings with visual attention. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 15, 3 (2019), 119. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Li Yehao, Yao Ting, Pan Yingwei, Chao Hongyang, and Mei Tao. 2018. Jointly localizing and describing events for dense video captioning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 74927500.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Li Yehao, Yao Ting, Pan Yingwei, Chao Hongyang, and Mei Tao. 2019. Deep metric learning with density adaptivity. IEEE Transactions on Multimedia 22, 5 (2019), 12851297.Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Lin Tianwei, Zhao Xu, and Shou Zheng. 2017. Single shot temporal action detection. In Proceedings of the 25th ACM International Conference on Multimedia. 988996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. [26] Long Fuchen, Yao Ting, Qiu Zhaofan, Tian Xinmei, Luo Jiebo, and Mei Tao. 2019. Gaussian temporal awareness networks for action localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 344353.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Mettes Pascal and Snoek Cees G. M.. 2019. Pointly-supervised action localization. International Journal of Computer Vision 127, 3 (2019), 263281. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] Oneata Dan, Verbeek Jakob, and Schmid Cordelia. 2013. Action and event recognition with Fisher vectors on a compact feature set. In Proceedings of the IEEE International Conference on Computer Vision. 18171824. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. [29] Owens Jim. 2015. Television Sports Production. CRC Press.Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Pan Yingwei, Li Yehao, Yao Ting, Mei Tao, Li Houqiang, and Rui Yong. 2016. Learning deep intrinsic video representation by exploring temporal coherence and graph structure. In IJCAI. Citeseer, 38323838. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] Pan Yingwei, Yao Ting, Li Houqiang, Ngo Chong-Wah, and Mei Tao. 2015. Semi-supervised hashing with semantic confidence for large scale visual search. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. 5362. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] Pan Yingwei, Yao Ting, Li Yehao, and Mei Tao. 2020. X-linear attention networks for image captioning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1097110980.Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] Pan Yingwei, Yao Ting, Tian Xinmei, Li Houqiang, and Ngo Chong-Wah. 2014. Click-through-based subspace learning for image search. In Proceedings of the 22nd ACM International Conference on Multimedia. 233236. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. [34] Pech-Pacheco José Luis, Cristóbal Gabriel, Chamorro-Martinez Jesús, and Fernández-Valdivia Joaquín. 2000. Diatom autofocusing in brightfield microscopy: A comparative study. In Proceedings 15th International Conference on Pattern Recognition (ICPR-2000), Vol. 3. IEEE, 314317. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. [35] Potapov Danila, Douze Matthijs, Harchaoui Zaid, and Schmid Cordelia. 2014. Category-specific video summarization. In European Conference on Computer Vision. Springer, 540555.Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Qiu Zhaofan, Yao Ting, and Mei Tao. 2017. Learning spatio-temporal representation with pseudo-3D residual networks. In Proceedings of the IEEE International Conference on Computer Vision. 55335541.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Raventos Arnau, Quijada Raul, Torres Luis, and Tarrés Francesc. 2015. Automatic summarization of soccer highlights using audio-visual descriptors. SpringerPlus 4, 1 (2015), 119.Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Rui Yong, Gupta Anoop, and Acero Alex. 2000. Automatically extracting highlights for TV baseball programs. In Proceedings of the 8th ACM International Conference on Multimedia. 105115. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. [39] Shih Huang-Chia. 2017. A survey of content-aware video analysis for sports. IEEE Transactions on Circuits and Systems for Video Technology 28, 5 (2017), 12121231.Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Shou Zheng, Wang Dongang, and Chang Shih-Fu. 2016. Temporal action localization in untrimmed videos via multi-stage cnns. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 10491058.Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Sun Chen, Shrivastava Abhinav, Vondrick Carl, Sukthankar Rahul, Murphy Kevin, and Schmid Cordelia. 2019. Relational action forecasting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 273283.Google ScholarGoogle ScholarCross RefCross Ref
  42. [42] Wang Jinjun, Xu Changsheng, Chng Engsiong, Lu Hanqing, and Tian Qi. 2008. Automatic composition of broadcast sports video. Multimedia Systems 14, 4 (2008), 179193. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. [43] Wang Jinjun, Xu Changsheng, Chng Engsiong, Wah Kongwah, and Tian Qi. 2004. Automatic replay generation for soccer video broadcasting. In Proceedings of the 12th Annual ACM International Conference on Multimedia. 3239. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. [44] Wang Xueting, Hara Kensho, Enokibori Yu, Hirayama Takatsugu, and Mase Kenji. 2016. Personal multi-view viewpoint recommendation based on trajectory distribution of the viewing target. In Proceedings of the 24th ACM International Conference on Multimedia. 471475. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. [45] Wang Xueting, Muramatu Yuki, Hirayama Takatsugu, and Mase Kenji. 2014. Context-dependent viewpoint sequence recommendation system for multi-view video. In 2014 IEEE International Symposium on Multimedia. IEEE, 195202. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. [46] Xiong Bo, Kalantidis Yannis, Ghadiyaram Deepti, and Grauman Kristen. 2019. Less is more: Learning highlight detection from video duration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12581267.Google ScholarGoogle ScholarCross RefCross Ref
  47. [47] Yao Ting, Mei Tao, and Rui Yong. 2016. Highlight detection with pairwise deep ranking for first-person video summarization. In CVPR.Google ScholarGoogle Scholar
  48. [48] Yao Ting, Pan Yingwei, Li Yehao, and Mei Tao. 2018. Exploring visual relationship for image captioning. In Proceedings of the European Conference on Computer Vision (ECCV’18). 684699.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. [49] Yao Ting, Pan Yingwei, Li Yehao, and Mei Tao. 2019. Hierarchy parsing for image captioning. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 26212629.Google ScholarGoogle ScholarCross RefCross Ref
  50. [50] Yao Ting, Zhang Yiheng, Qiu Zhaofan, Pan Yingwei, and Mei Tao. 2020. SeCo: Exploring sequence supervision for unsupervised representation learning. CoRR abs/2008.00975. arxiv:2008.00975 https://arxiv.org/abs/2008.00975Google ScholarGoogle Scholar
  51. [51] Yu Bin and Jain Anil K.. 1997. Lane boundary detection using a multiresolution hough transform. In Proceedings of International Conference on Image Processing, Vol. 2. IEEE, 748751.Google ScholarGoogle ScholarCross RefCross Ref
  52. [52] Zhang Ke, Chao Wei-Lun, Sha Fei, and Grauman Kristen. 2016. Video summarization with long short-term memory. In European Conference on Computer Vision. Springer, 766782.Google ScholarGoogle ScholarCross RefCross Ref
  53. [53] Zhang Ke, Grauman Kristen, and Sha Fei. 2018. Retrospective encoders for video summarization. In Proceedings of the European Conference on Computer Vision (ECCV’18). 383399.Google ScholarGoogle ScholarCross RefCross Ref
  54. [54] Zhang Ning, Liu Jingen, Wang Ke, Zeng Dan, and Mei Tao. 2020. Robust visual object tracking with two-stream residual convolutional networks. CoRR abs/2005.06536. arxiv:2005.06536 https://arxiv.org/abs/2005.06536Google ScholarGoogle Scholar
  55. [55] Zhang Shifeng, Zhu Xiangyu, Lei Zhen, Shi Hailin, Wang Xiaobo, and Li Stan Z.. 2017. FaceBoxes: A CPU real-time face detector with high accuracy. In 2017 IEEE International Joint Conference on Biometrics (IJCB’17). IEEE, 19.Google ScholarGoogle ScholarCross RefCross Ref
  56. [56] Zhao Jiaojiao and Snoek Cees G. M.. 2019. Dance with flow: Two-in-one stream action detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 99359944.Google ScholarGoogle ScholarCross RefCross Ref
  57. [57] Zhao Yue, Xiong Yuanjun, Wang Limin, Wu Zhirong, Tang Xiaoou, and Lin Dahua. 2017. Temporal action detection with structured segment networks. In Proceedings of the IEEE International Conference on Computer Vision. 29142923.Google ScholarGoogle ScholarCross RefCross Ref
  58. [58] Zuo Jiawei, Chen Yue, Wang Linfang, Pan Yingwei, Yao Ting, Wang Ke, and Mei Tao. 2020. iDirector: An intelligent directing system for live broadcast. In Proceedings of the 28th ACM International Conference on Multimedia. 45454547.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Smart Director: An Event-Driven Directing System for Live Broadcasting

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM Transactions on Multimedia Computing, Communications, and Applications
              ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 17, Issue 4
              November 2021
              529 pages
              ISSN:1551-6857
              EISSN:1551-6865
              DOI:10.1145/3492437
              Issue’s Table of Contents

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 12 November 2021
              • Accepted: 1 February 2021
              • Revised: 1 January 2021
              • Received: 1 September 2020
              Published in tomm Volume 17, Issue 4

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article
              • Refereed

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader

            Full Text

            View this article in Full Text.

            View Full Text

            HTML Format

            View this article in HTML Format .

            View HTML Format
            About Cookies On This Site

            We use cookies to ensure that we give you the best experience on our website.

            Learn more

            Got it!