skip to main content
research-article

CRAR: Accelerating Stereo Matching with Cascaded Residual Regression and Adaptive Refinement

Authors Info & Claims
Published:04 March 2022Publication History
Skip Abstract Section

Abstract

Dense stereo matching estimates the depth for each pixel of the referenced images. Recently, deep learning algorithms have dramatically promoted the development of stereo matching. The state-of-the-art result is achieved by models adopting deep convolutional neural networks. However, a considerable computational burden is also introduced, which slows the inference. To solve this problem, previous works down-sampled the input images to decrease the spatial size. However, down-sampling increases the error rate and its lower bound. In this article, we accelerate stereo matching algorithms through the improvement of network structure. Inspired by network compression, we conduct decomposition and sparsification to squeeze the computationally expensive cost optimization network. It is sparsified and then decomposed into smaller networks, which are designed and trained in a cascaded manner to reach the nearest possible performance of the larger network. Previous methods have utilized numerous refinement methods to adjust the coarse disparity. We integrate refinement methods to create an unified algorithm to utilize parallelism for running devices to further accelerate the inference. The extensive experiments on Kitti2015, Kitti2012, and Middlebury datasets demonstrate the efficiency of our method.

REFERENCES

  1. [1] Batsos Konstantinos, Cai Changjiang, and Mordohai Philippos. 2018. CBMV: A coalesced bidirectional matching volume for disparity estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 20602069.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Chang Jia-Ren and Chen Yong-Sheng. 2018. Pyramid stereo matching network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 54105418.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Dosovitskiy Alexey, Fischer Philipp, Ilg Eddy, Hausser Philip, Hazirbas Caner, Golkov Vladimir, Smagt Patrick Van Der, Cremers Daniel, and Brox Thomas. 2015. FlowNet: Learning optical flow with convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision. 27582766.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] Duchowski Andrew T. and Çöltekin Arzu. 2007. Foveated gaze-contingent displays for peripheral LOD management, 3D visualization, and stereo imaging. ACM Transactions on Multimedia Computing, Communications, and Applications 3, 4 (Dec. 2007), Article 6, 18 pages. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Elad Michael. 2002. On the origin of the bilateral filter and ways to improve it. IEEE Transactions on Image Processing 11, 10 (2002), 11411151.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Figurnov Mikhail, Ibraimova Aizhan, Vetrov Dmitry P., and Kohli Pushmeet. 2016. PerforatedCNNs: Acceleration through elimination of redundant convolutions. In Advances in Neural Information Processing Systems. 947955.Google ScholarGoogle Scholar
  7. [7] Geiger Andreas, Lenz Philip, Stiller Christoph, and Urtasun Raquel. 2013. Vision meets robotics: The KITTI dataset. International Journal of Robotics Research 32, 11 (2013), 12311237.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Geiger Andreas, Lenz Philip, and Urtasun Raquel. 2012. Are we ready for autonomous driving? The KITTI vision benchmark suite. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 33543361.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Gong Yongyi, Li Shangru, Wattanachote Kanoksak, and Luo Xiaonan. 2019. Advanced stereo seam carving by considering occlusions on both sides. ACM Transactions on Multimedia Computing, Communications, and Applications 15, 3 (Aug. 2019), Article 69, 21 pages. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. [10] He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770778.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Hirschmuller Heiko. 2007. Stereo processing by semiglobal matching and mutual information. IEEE Transactions on Pattern Analysis and Machine Intelligence 30, 2 (2007), 328341.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Jie Zequn, Wang Pengfei, Ling Yonggen, Zhao Bo, Wei Yunchao, Feng Jiashi, and Liu Wei. 2018. Left-right comparative recurrent model for stereo matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 38383846.Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Kanade Takeo and Okutomi Masatoshi. 1994. A stereo matching algorithm with an adaptive window: Theory and experiment. IEEE Transactions on Pattern Analysis and Machine Intelligence 16, 9 (1994), 920932.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] Khamis Sameh, Fanello Sean, Rhemann Christoph, Kowdle Adarsh, Valentin Julien, and Izadi Shahram. 2018. Stereonet: Guided hierarchical refinement for real-time edge-aware depth prediction. In Proceedings of the European Conference on Computer Vision (ECCV’18). 573590.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Knobelreiter Patrick, Reinbacher Christian, Shekhovtsov Alexander, and Pock Thomas. 2017. End-to-end training of hybrid CNN-CRF models for stereo. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 23392348.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Kong Dan and Tao Hai. 2004. A method for learning matching errors for stereo computation. In Proceedings of the British Machine Vision Conference (BMVC’04), Vol. 1. 2.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Kong Dan and Tao Hai. 2006. Stereo matching via learning multiple experts behaviors. In Proceedings of the British Machine Vision Conference (BMVC’06), Vol. 1. 2.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Kuzmin Andrey, Mikushin Dmitry, and Lempitsky Victor. 2017. End-to-end learning of cost-volume aggregation for real-time dense stereo. In Proceedings of the 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP’17). IEEE, Los Alamitos, CA, 16.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] LeCun Yann, Bengio Yoshua, and Hinton Geoffrey. 2015. Deep learning. Nature 521, 7553 (2015), 436.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Li Lincheng, Zhang Shunli, Yu Xin, and Zhang Li. 2016. PMSC: Patchmatch-based superpixel cut for accurate stereo matching. IEEE Transactions on Circuits and Systems for Video Technology 28, 3 (2016), 679692.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Liang Zhengfa, Feng Yiliu, Guo Yulan, Liu Hengzhu, Chen Wei, Qiao Linbo, Zhou Li, and Zhang Jianfeng. 2018. Learning for disparity estimation through feature constancy. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 28112820.Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Matthies Larry, Kanade Takeo, and Szeliski Richard. 1989. Kalman filter-based algorithms for estimating depth from image sequences. International Journal of Computer Vision 3, 3 (1989), 209238.Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Mayer Nikolaus, Ilg Eddy, Hausser Philip, Fischer Philipp, Cremers Daniel, Dosovitskiy Alexey, and Brox Thomas. 2016. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 40404048.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Mei Xing, Sun Xun, Zhou Mingcai, Jiao Shaohui, Wang Haitao, and Zhang Xiaopeng. 2011. On building an accurate stereo matching system on graphics hardware. In Proceedings of the 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops’11). IEEE, Los Alamitos, CA, 467474.Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Nie Guang-Yu, Cheng Ming-Ming, Liu Yun, Liang Zhengfa, Fan Deng-Ping, Liu Yue, and Wang Yongtian. 2019. Multi-level context ultra-aggregation for stereo matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 32833291.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Pang Jiahao, Sun Wenxiu, Ren Jimmy S. J., Yang Chengxi, and Yan Qiong. 2017. Cascade residual learning: A two-stage convolutional neural network for stereo matching. In Proceedings of the IEEE International Conference on Computer Vision. 887895.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Paszke Adam, Gross Sam, Chintala Soumith, Chanan Gregory, Yang Edward, DeVito Zachary, Lin Zeming, Desmaison Alban, Antiga Luca, and Lerer Adam. 2017. Automatic differentiation in PyTorch. In Proceedings of the 31st Annual Conference on Neural Information Processing Systems (NIPS’17).Google ScholarGoogle Scholar
  28. [28] Peris Martin, Martull Sara, Maki Atsuto, Ohkawa Yasuhiro, and Fukui Kazuhiro. 2012. Towards a simulation driven stereo vision system. In Proceedings of the 21st International Conference on Pattern Recognition (ICPR’12). IEEE, Los Alamitos, CA, 10381042.Google ScholarGoogle Scholar
  29. [29] Scharstein Daniel, Hirschmüller Heiko, Kitajima York, Krathwohl Greg, Nešić Nera, Wang Xi, and Westling Porter. 2014. High-resolution stereo datasets with subpixel-accurate ground truth. In Proceedings of the German Conference on Pattern Recognition. 3142.Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Scharstein Daniel and Szeliski Richard. 2002. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International Journal of Computer Vision 47, 1–3 (2002), 742.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] Schonberger Johannes L., Sinha Sudipta N., and Pollefeys Marc. 2018. Learning to fuse proposals from multiple scanline optimizations in semi-global matching. In Proceedings of the European Conference on Computer Vision (ECCV’18). 739755.Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Seki Akihito and Pollefeys Marc. 2017. SGM-Nets: Semi-global matching with neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17).Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] Tai Cheng, Xiao Tong, Zhang Yi, Wang Xiaogang, and E. Weinan. 2015. Convolutional neural networks with low-rank regularization. arXiv Preprint arXiv:1511.06067 (2015).Google ScholarGoogle Scholar
  34. [34] Taniai Tatsunori, Matsushita Yasuyuki, Sato Yoichi, and Naemura Takeshi. 2017. Continuous 3D label stereo matching using local expansion moves. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 11 (2017), 27252739.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. [35] Tonioni Alessio, Tosi Fabio, Poggi Matteo, Mattoccia Stefano, and Stefano Luigi Di. 2019. Real-time self-adaptive deep stereo. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 195204.Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Tulyakov Stepan, Ivanov Anton, and Fleuret Francois. 2018. Practical deep stereo (PDS): Toward applications-friendly deep stereo matching. In Advances in Neural Information Processing Systems. 58715881.Google ScholarGoogle Scholar
  37. [37] Yin Zhichao, Darrell Trevor, and Yu Fisher. 2019. Hierarchical discrete distribution decomposition for match density estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 60446053.Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Zbontar Jure and LeCun Yann. 2016. Stereo matching by training a convolutional neural network to compare image patches. Journal of Machine Learning Research 17, 1 (2016), 22872318.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. [39] Zeng Linghua and Tian Xinmei. 2018. Accelerating convolutional neural networks by removing interspatial and interkernel redundancies. IEEE Transactions on Cybernetics 50, 2 (2018), 452–464.Google ScholarGoogle Scholar
  40. [40] Zhang Feihu, Prisacariu Victor, Yang Ruigang, and Torr Philip H. S.. 2019. GA-Net: Guided aggregation net for end-to-end stereo matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 185194.Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Zhang Feihu and Wah Benjamin W.. 2017. Fundamental principles on learning new features for effective dense matching. IEEE Transactions on Image Processing 27, 2 (2017), 822836.Google ScholarGoogle ScholarCross RefCross Ref
  42. [42] Zhang Ke, Lu Jiangbo, and Lafruit Gauthier. 2009. Cross-based local stereo matching using orthogonal integral images. IEEE Transactions on Circuits and Systems for Video Technology 19, 7 (2009), 10731079.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. [43] Zhang Xiangyu, Zou Jianhua, He Kaiming, and Sun Jian. 2015. Accelerating very deep convolutional networks for classification and detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 38, 10 (2015), 19431955.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. [44] Zhou Wengang, Li Houqiang, Lu Yijuan, and Tian Qi. 2013. SIFT match verification by geometric coding for large-scale partial-duplicate web image search. ACM Transactions on Multimedia Computing, Communications, and Applications 9, 1 (Feb. 2013), Article 4, 18 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. CRAR: Accelerating Stereo Matching with Cascaded Residual Regression and Adaptive Refinement

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Multimedia Computing, Communications, and Applications
        ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 18, Issue 3
        August 2022
        478 pages
        ISSN:1551-6857
        EISSN:1551-6865
        DOI:10.1145/3505208
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 4 March 2022
        • Accepted: 1 September 2021
        • Revised: 1 June 2021
        • Received: 1 July 2020
        Published in tomm Volume 18, Issue 3

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Refereed
      • Article Metrics

        • Downloads (Last 12 months)118
        • Downloads (Last 6 weeks)0

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      View Full Text

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!