skip to main content
research-article

Pedestrian-Aware Panoramic Video Stitching Based on a Structured Camera Array

Authors Info & Claims
Published:12 November 2021Publication History
Skip Abstract Section

Abstract

The panorama stitching system is an indispensable module in surveillance or space exploration. Such a system enables the viewer to understand the surroundings instantly by aligning the surrounding images on a plane and fusing them naturally. The bottleneck of existing systems mainly lies in alignment and naturalness of the transition of adjacent images. When facing dynamic foregrounds, they may produce outputs with misaligned semantic objects, which is evident and sensitive to human perception. We solve three key issues in the existing workflow that can affect its efficiency and the quality of the obtained panoramic video and present Pedestrian360, a panoramic video system based on a structured camera array (a spatial surround-view camera system). First, to get a geometrically aligned 360○ view in the horizontal direction, we build a unified multi-camera coordinate system via a novel refinement approach that jointly optimizes camera poses. Second, to eliminate the brightness and color difference of images taken by different cameras, we design a photometric alignment approach by introducing a bias to the baseline linear adjustment model and solving it with two-step least-squares. Third, considering that the human visual system is more sensitive to high-level semantic objects, such as pedestrians and vehicles, we integrate the results of instance segmentation into the framework of dynamic programming in the seam-cutting step. To our knowledge, we are the first to introduce instance segmentation to the seam-cutting problem, which can ensure the integrity of the salient objects in a panorama. Specifically, in our surveillance oriented system, we choose the most significant target, pedestrians, as the seam avoidance target, and this accounts for the name Pedestrian360. To validate the effectiveness and efficiency of Pedestrian360, a large-scale dataset composed of videos with pedestrians in five scenes is established. The test results on this dataset demonstrate the superiority of Pedestrian360 compared to its competitors. Experimental results show that Pedestrian360 can stitch videos at a speed of 12 to 26 fps, which depends on the number of objects in the shooting scene and their frequencies of movements. To make our reported results reproducible, the relevant code and collected data are publicly available at https://cslinzhang.github.io/Pedestrian360-Homepage/.

REFERENCES

  1. [1] Badrinarayanan Vijay, Kendall Alex, and Cipolla Roberto. 2017. SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 12 (2017), 24812495.Google ScholarGoogle Scholar
  2. [2] Bai Zongwen, Li Ying, Chen Xiaohuan, Yi Tingting, Wei Wei, Wozniak Marcin, and Damasevicius Robertas. 2020. Real-time video stitching for mine surveillance using a hybrid image registration Method. Electronics 9, 9 (2020), 1336.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Brown Matthew and Lowe David G.. 2003. Recognising panoramas. In Proceedings of the IEEE International Conference on Computer Vision. 12181225. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] Brown Matthew and Lowe David G.. 2007. Automatic panoramic image stitching using invariant features. International Journal of Computer Vision 74, 1 (2007), 5973. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Burt Peter J. and Adelson Edward H.. 1983. A multiresolution spline with application to image mosaics. ACM Transactions on Graphics 2, 4 (1983), 217236. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Chen Yu-Sheng and Chuang Yung-Yu. 2016. Natural image stitching with the global similarity prior. In Proceedings of the European Conference on Computer Vision. 186201.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Choi Kyoungtaek, Jung Ho Gi, and Suhr Jae Kyu. 2018. Automatic calibration of an around view monitor system exploiting lane markings. Sensors 18, 9 (2018), 2956.Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Efros Alexei A. and Freeman William T.. 2001. Image quilting for texture synthesis and transfer. In Proceedings of the 28th International Conference on Computer Graphics and Interactive Techniques (SIGGRAPH’01). 341346. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Fu Keren, Zhao Qijun, and Gu Irene Yu-Hua. 2018. Refinet: A deep segmentation assisted refinement network for salient object detection. IEEE Transactions on Multimedia 21, 2 (2018), 457469. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. [10] Gao Junhong, Kim Seon Joo, and Brown Michael S.. 2011. Constructing image panoramas using dual-homography warping. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4956. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Gao Yi, Lin Chunyu, Zhao Yao, Wang Xin, Wei Shikui, and Huang Qi. 2017. 3-D surround view for advanced driver assistance systems. IEEE Transactions on Intelligent Transportation Systems 19, 1 (2017), 320328.Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Han Seung-Ryong, Min Jongsul, Park Taesung, and Kim Yongje. 2012. Photometric and geometric rectification for stereoscopic images. In Three-Dimensional Image Processing and Applications II, Vol. 8290. SPIE, 829007.Google ScholarGoogle Scholar
  13. [13] He Botao and Yu Shaohua. 2016. Parallax-robust surveillance video stitching. Sensors 16, 1 (2016), 7.Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] He Kaiming, Gkioxari Georgia, Dollár Piotr, and Girshick Ross. 2017. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision. 29612969.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Hedi Adam and Lončarić Sven. 2012. A system for vehicle surround view. IFAC Proceedings Volumes 45, 22 (2012), 120125.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Heng Lionel, Burki Mathias, Lee Gim Hee, Furgale Paul, and Pollefeys Marc. 2014. Infrastructure-based calibration of a multi-camera rig. In Proceedings of the IEEE International Conference on Robotics and Automation. 49124919.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Heng Lionel, Li Bo, and Pollefeys Marc. 2013. Camodocal: Automatic intrinsic and extrinsic calibration of a rig with multiple generic cameras and odometry. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. 17931800.Google ScholarGoogle Scholar
  18. [18] Hu Jie, Zhang Dong-Qing, Yu Heather, and Chen Chang Wen. 2015. Discontinuous seam cutting for enhanced video stitching. In Proceedings of the IEEE International Conference on Multimedia and Expo. 16.Google ScholarGoogle Scholar
  19. [19] Jiang Huaizu, Wang Jingdong, Yuan Zejian, Wu Yang, Zheng Nanning, and Li Shipeng. 2013. Salient object detection: A discriminative regional feature integration approach. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 20832090. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Jiang Peng, Ling Haibin, Yu Jingyi, and Peng Jingliang. 2013. Salient region detection by UFO: Uniqueness, focusness and objectness. In Proceedings of the IEEE International Conference on Computer Vision. 19761983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. [21] Kang Jeonho, Kim Junsik, Lee Inhong, and Kim Kyuheon. 2019. Minimum error seam-based efficient panorama video stitching method robust to parallax. IEEE Access 7 (2019), 167127167140.Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Kang Lai, Wei Yingmei, Jiang Jie, and Xie Yuxiang. 2019. Robust cylindrical panorama stitching for low-texture scenes based on image alignment using deep learning and iterative optimization. Sensors 19, 23 (2019), 5310.Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Kümmerle Rainer, Grisetti Giorgio, Strasdat Hauke, Konolige Kurt, and Burgard Wolfram. 2011. G2o: A general framework for graph optimization. In Proceedings of the IEEE International Conference on Robotics and Automation. 36073613.Google ScholarGoogle Scholar
  24. [24] Kwatra Vivek, Schödl Arno, Essa Irfan, Turk Greg, and Bobick Aaron. 2003. Graphcut textures: Image and video synthesis using graph cuts. ACM Transactions on Graphics 22, 3 (2003), 277286. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. [25] Lee Jungjin, Kim Bumki, Kim Kyehyun, Kim Younghui, and Noh Junyong. 2016. Rich360: Optimized spherical representation from structured panoramic camera arrays. ACM Transactions on Graphics 35, 4 (2016), 111. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. [26] Li Hongdong and Hartley Richard. 2006. Five-point motion estimation made easy. In Proceedings of the IEEE International Conference on Pattern Recognition, Vol. 1. 630633. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] Li Jiangeng, Fan Minjie, Wang Guangsheng, Li Xiaoli, and Sun Rihui. 2018. Panorama video stitching system based on VR Works 360 video. In Proceedings of the IEEE Chinese Automation Congress. 715720.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Li Jia, Zhao Yifan, Ye Weihua, Yu Kaiwen, and Ge Shiming. 2019. Attentive deep stitching and quality assessment for omnidirectional images. IEEE Journal of Selected Topics in Signal Processing 14, 1 (2019), 209221.Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Li Nan, Liao Tianli, and Wang Chao. 2018. Perception-based seam cutting for image stitching. Signal, Image and Video Processing 12, 5 (2018), 967974.Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Liao Tianli, Chen Jing, and Xu Yifang. 2019. Quality evaluation-based iterative seam estimation for image stitching. Signal, Image and Video Processing 13, 6 (2019), 11991206.Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Liao Tianli and Li Nan. 2019. Single-perspective warps in natural image stitching. IEEE Transactions on Image Processing 29 (2019), 724735.Google ScholarGoogle Scholar
  32. [32] Lin Tsung-Yi, Maire Michael, Belongie Serge, Hays James, Perona Pietro, Ramanan Deva, Dollár Piotr, and Zitnick C. Lawrence. 2014. Microsoft COCO: Common objects in context. In Proceedings of the European Conference on Computer Vision. 740755.Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] Lin Wen-Yan, Liu Siying, Matsushita Yasuyuki, Ng Tian-Tsong, and Cheong Loong-Fah. 2011. Smoothly varying affine stitching. In Proceedings of the IEEE International Conference on Computer Vision. 345352. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. [34] Liu Hanyu, Tang Chong, Wu Shaoen, and Wang Honggang. 2011. Real-time video surveillance for large scenes. In Proceedings of the IEEE International Conference on Wireless Communications and Signal Processing. 14.Google ScholarGoogle ScholarCross RefCross Ref
  35. [35] Liu Qiongxin, Su Xiangyang, Zhang Lei, and Huang Hua. 2020. Panoramic video stitching of dual cameras based on spatio-temporal seam optimization. Multimedia Tools and Applications 79, 5 (2020), 31073124.Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Liu Si, Wei Zhen, Sun Yao, Ou Xinyu, Lin Junyu, Liu Bin, and Yang Ming-Hsuan. 2018. Composing semantic collage for image retargeting. IEEE Transactions on Image Processing 27, 10 (2018), 50325043.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Liu Tie, Yuan Zejian, Sun Jian, Wang Jingdong, Zheng Nanning, Tang Xiaoou, and Shum Heung-Yeung. 2010. Learning to detect a salient object. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 2 (2010), 353367. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. [38] Natroshvili Koba and Scholl Kay-Ulrich. 2017. Automatic extrinsic calibration methods for surround view systems. In Proceedings of the IEEE Intelligent Vehicles Symposium. 8288.Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Plath Nils, Toussaint Marc, and Nakajima Shinichi. 2009. Multi-class image segmentation using conditional random fields and global classification. In Proceedings of the International Conference on Machine Learning. 817824. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. [40] Ren Shaoqing, He Kaiming, Girshick Ross, and Sun Jian. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems. 9199. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. [41] Sharma Abhishek, Tuzel Oncel, and Liu Ming-Yu. 2014. Recursive context propagation network for semantic scene labeling. In Advances in Neural Information Processing Systems. 24472455. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. [42] Souly Nasim, Spampinato Concetto, and Shah Mubarak. 2017. Semi supervised semantic segmentation using generative adversarial network. In Proceedings of the IEEE International Conference on Computer Vision. 56885696.Google ScholarGoogle ScholarCross RefCross Ref
  43. [43] Suen Simon T. Y., Lam Edmund Y., and Wong Kenneth K. Y.. 2006. Digital photograph stitching with optimized matching of gradient and curvature. In Digital Photography II, Vol. 6069. SPIE, 60690G.Google ScholarGoogle Scholar
  44. [44] Tennøe Marius, Helgedagsrud Espen, Næss Mikkel, Alstad Henrik Kjus, Stensland Håkon Kvale, Gaddam Vamsidhar Reddy, Johansen Dag, Griwodz Carsten, and Halvorsen Pål. 2013. Efficient implementation and processing of a real-time panorama video pipeline. In Proceedings of the IEEE International Symposium on Multimedia. 7683. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. [45] Ueshiba Toshio and Tomita Fumiaki. 2002. Calibration of multi-camera systems using planar patterns. Sensors 8 (2002), 4.Google ScholarGoogle Scholar
  46. [46] Uyttendaele Matthew, Eden Ashley, and Skeliski Richard. 2001. Eliminating ghosting and exposure artifacts in image mosaics. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 2. 509516.Google ScholarGoogle ScholarCross RefCross Ref
  47. [47] Wang Lijun, Lu Huchuan, Ruan Xiang, and Yang Ming-Hsuan. 2015. Deep networks for saliency detection via local estimation and global search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 31833192.Google ScholarGoogle ScholarCross RefCross Ref
  48. [48] Wei Yichen, Wen Fang, Zhu Wangjiang, and Sun Jian. 2012. Geodesic saliency using background priors. In Proceedings of the European Conference on Computer Vision. 2942. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. [49] Xu Yuan, Zhou Qinghai, Gong Liwei, Zhu Mingcheng, Ding Xiaohong, and Teng Robert K. F.. 2013. High-speed simultaneous image distortion correction transformations for a multicamera cylindrical panorama real-time video system using FPGA. IEEE Transactions on Circuits and Systems for Video Technology 24, 6 (2013), 10611069.Google ScholarGoogle ScholarCross RefCross Ref
  50. [50] Zaragoza Julio, Chin Tat-Jun, Brown Michael S., and Suter David. 2014. As-projective-as-possible image stitching with moving DLT. IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 7 (2014), 12851298. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. [51] Zhang Buyue, Appia Vikram, Pekkucuksen Ibrahim, Liu Yucheng, Batur Aziz Umit, Shastry Pavan, Liu Stanley, Sivasankaran Shiju, and Chitnis Kedar. 2014. A surround view camera solution for embedded systems. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 662667. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. [52] Zhang Guofeng, He Yi, Chen Weifeng, Jia Jiaya, and Bao Hujun. 2016. Multi-viewpoint panorama construction with wide-baseline images. IEEE Transactions on Image Processing 25, 7 (2016), 30993111.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. [53] Zhang Lin, Chen Juntao, Liu Dongyang, Shen Ying, and Zhao Shengjie. 2019. Seamless 3D surround view with a novel burger model. In Proceedings of the IEEE International Conference on Image Processing. 41504154.Google ScholarGoogle ScholarCross RefCross Ref
  54. [54] Zhang Liuxin, Li Bin, and Jia Yunde. 2007. A practical calibration method for multiple cameras. In Proceedings of the IEEE International Conference on Image and Graphics. 4550. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. [55] Zhang Zhengyou. 2000. A flexible new technique for camera calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 11 (2000), 13301334. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. [56] Zou Wenbin and Komodakis Nikos. 2015. HARF: Hierarchy-associated rich features for salient object detection. In Proceedings of the IEEE International Conference on Computer Vision. 406414. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Pedestrian-Aware Panoramic Video Stitching Based on a Structured Camera Array

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Multimedia Computing, Communications, and Applications
          ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 17, Issue 4
          November 2021
          529 pages
          ISSN:1551-6857
          EISSN:1551-6865
          DOI:10.1145/3492437
          Issue’s Table of Contents

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 12 November 2021
          • Accepted: 1 April 2021
          • Revised: 1 March 2021
          • Received: 1 November 2020
          Published in tomm Volume 17, Issue 4

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Full Text

        View this article in Full Text.

        View Full Text

        HTML Format

        View this article in HTML Format .

        View HTML Format
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!