skip to main content
research-article

3D CV Descriptor on Parallel Heterogeneous Platforms

Published:24 September 2015Publication History
Skip Abstract Section

Abstract

Embedded three-dimensional (3D) Computer Vision (CV) is considered a technology enabler for future consumer applications, attracting a wide interest in academia and industry. However, 3D CV processing is a computation-intensive task. Its high computational cost is directly related to the processing of 3D point clouds, with the 3D descriptor computation representing one of the main bottlenecks. Understanding the main computational challenges of 3D CV applications, as well as the key characteristics, enabling features, and limitations of current computing platforms, is clearly strategic to identify the directions of evolution for future embedded processing systems targeting 3D CV.

In this work, an innovative and complex 3D descriptor (called SHOT) has been ported on a high-end and an embedded computing platform. The high-end system is composed by a high-performance Intel CPU coupled with a Nvidia GPU. The embedded platform is, instead, composed by an ARM-based processor, coupled with the STHORM accelerator. STHORM is a many-core low-power accelerator developed by ST Microelectronics, featuring up to 64 computational units. The SHOT descriptor has been parallelized using the OpenCL programming model for both platforms.

Finally, we have performed an in-depth performance comparison and analysis between general-purpose processors and accelerators in both high-end and embedded domains, discussing and highlighting the main differences in the Hardware/Software (HW/SW) design methodologies and approaches between high-end and embedded systems targeting 3D CV applications.

References

  1. Y. Allusse, P. Horain, A. Agarwal, and C. Saipriyadarshan. 2008. GpuCV: An opensource GPU-accelerated framework for image processing and computer vision. In 16th ACM International Conference on Multimedia (MM’08). ACM, New York, NY, 1089--1092. DOI:http://dx.doi.org/10.1145/1459359.1459578 Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. P. Babenko and M. Shah. 2008. MinGPU: A minimum GPU library for computer vision. Journal of Real-Time Image Processing 3, 4 (2008), 255--268. DOI:http://dx.doi.org/10.1007/s11554-008-0085-xGoogle ScholarGoogle ScholarCross RefCross Ref
  3. S. P. Baker and R. W. Sadowski. 2013. GPU assisted processing of point cloud data sets for ground segmentation in autonomous vehicles. In 2013 IEEE International Conference on Technologies for Practical Robot Applications (TePRA). 1--6. DOI:http://dx.doi.org/10.1109/TePRA.2013.6556352Google ScholarGoogle ScholarCross RefCross Ref
  4. L. Benini, E. Flamand, D. Fuin, and D. Melpignano. 2012. P2012: Building an ecosystem for a scalable, modular and high-efficiency embedded computing accelerator. In Design, Automation Test in Europe Conference Exhibition (DATE’12). 983--987. DOI:http://dx.doi.org/10.1109/DATE.2012.6176639 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. G. Bradski. 2000. The OpenCV library. Doctor Dobbs Journal 25, 11 (2000), 120--126.Google ScholarGoogle Scholar
  6. B. Brousseau and J. Rose. 2012. An energy-efficient, fast FPGA hardware architecture for OpenCV-Compatible object detection. In 2012 International Conference on Field-Programmable Technology (FPT). 166--173. DOI:http://dx.doi.org/10.1109/FPT.2012.6412130Google ScholarGoogle ScholarCross RefCross Ref
  7. C. Cleverdon. 1997. Readings in information retrieval. Morgan Kaufmann Publishers Inc., San Francisco, CA, 47--59. http://dl.acm.org/citation.cfm?id=275537.275544 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. N. Cornelis and L. Van Gool. 2008. Fast scale invariant feature detection and matching on programmable graphics hardware. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’08). 1--8. DOI:http://dx.doi.org/10.1109/CVPRW.2008.4563087Google ScholarGoogle Scholar
  9. B. Drost and S. Ilic. 2012. 3D object detection and localization using multimodal point pair features. In 3DIMPVT. 9--16. http://dblp.uni-trier.de/db/conf/3dim/3dimpvt2012.html Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Fang, A. L. Varbanescu, and H. Sips. 2011. A comprehensive performance comparison of CUDA and OpenCL. In 2011 International Conference on Parallel Processing (ICPP). 216--225. DOI:http://dx.doi.org/10.1109/ICPP.2011.45 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Fung and S. Mann. 2005. OpenVIDIA: Parallel GPU computer vision. In Proceedings of the 13th Annual ACM International Conference on Multimedia (MULTIMEDIA’05). ACM, New York, NY, 849--852. DOI:http://dx.doi.org/10.1145/1101149.1101334 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. S. Hunter. 1958. Photoelectric color difference meter. Journal of the Optical Society of America 48, 12 (Dec. 1958), 985--993. DOI:http://dx.doi.org/10.1364/JOSA.48.000985Google ScholarGoogle ScholarCross RefCross Ref
  13. IEEE. 2008. IEEE standard for floating-point arithmetic. IEEE Std 754-2008 (2008), 1--70. DOI:http://dx.doi.org/10.1109/IEEESTD.2008.4610935Google ScholarGoogle Scholar
  14. Khronos Group. 2014. The OpenCL Specification, version 2.0. (2014). http://khronos.org/registry/cl/specsGoogle ScholarGoogle Scholar
  15. Y. Luo and R. Duraiswami. 2008. Canny edge detection on NVIDIA CUDA. In Proceedings of Computer Vision and Pattern Recognition Workshops (CVPRW). DOI:http://dx.doi.org/10.1109/CVPRW.2008.4563088Google ScholarGoogle Scholar
  16. H. Mark. 2013. Unified Memory in CUDA 6. (Nov. 2013). http://devblogs.nvidia.com/parallelforall/unified-memory-in-cuda-6 Accessed: 2013-11-18.Google ScholarGoogle Scholar
  17. O. Mateo Lozano and K. Otsuka. 2009. Real-time visual tracker by stream processing. Journal of Signal Processing Systems 57 (2009), 285--295. DOI:http://dx.doi.org/10.1007/s11265-008-0250-2 Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A. S. Mian, M. Bennamoun, and R. A. Owens. 2006. A novel representation and feature matching algorithm for automatic pairwise registration of range images. International Journal of Computer Vision 66, 1 (2006), 19--40. DOI:http://dx.doi.org/10.1007/s11263-005-3221-0 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Y. Mizukami and K. Tadamura. Optical flow computation on compute unified device architecture. In 14th International Conference on Image Analysis and Processing (ICIAP). 179--184. DOI:http://dx.doi.org/10.1109/ICIAP.2007.4362776 Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Nvidia. 2009. NVIDIA’s Next Generation CUDA Compute Architecture: Fermi. Technical Report. Retrieved from http://www. nvidia.com/object/fermi-architecture.html/.Google ScholarGoogle Scholar
  21. Nvidia. 2011. Tesla C2075 computing processor board. Retrieved from http://www.nvidia.com/object/tesla- workstations.html.Google ScholarGoogle Scholar
  22. Nvidia. 2013. NVIDIA CUDA C Programming Guide. Retrieved from http://docs.nvidia.com/cuda/cuda-c-programming-guide.Google ScholarGoogle Scholar
  23. Nvidia. 2014. NVIDIA Tegra K1 A New Era in Mobile Computing. Technical Report. Retrieved from http://www.nvidia.com/object/white-papers.html (White Paper).Google ScholarGoogle Scholar
  24. S. Orts-Escolano, V. Morell, J. Garcia-Rodriguez, M. Cazorla, and R. B. Fisher. 2013. Real-time 3D semi-local surface patch extraction using GPGPU. Journal of Real-Time Image Processing (2013), 1--20. DOI:http://dx.doi.org/10.1007/s11554-013-0385-7Google ScholarGoogle Scholar
  25. D. Palossi, F. Tombari, S. Salti, M. Ruggiero, L. Di Stefano, and L. Benini. 2013. GPU-SHOT: Parallel optimization for real-time 3D local description. In 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 584--591. DOI:http://dx.doi.org/10.1109/CVPRW.2013.88 Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. N. Pinto, D. Doukhan, J. J. DiCarlo, and D. D. Cox. 2009. A high-throughput screening approach to discovering good forms of biologically inspired visual representation. PLoS Computational Biology 5, 11 (2009), e1000579. DOI:http://dx.doi.org/10.1371/journal.pcbi.1000579Google ScholarGoogle ScholarCross RefCross Ref
  27. K. Pulli, A. Baksheev, K. Kornyakov, and V. Eruhimov. 2012. Realtime computer vision with OpenCV. Queue 10, 4, Article 40 (Apr. 2012), 17 pages. DOI:http://dx.doi.org/10.1145/2181796.2206309 Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. Rajan, S. Wang, R. Inkol, and A. Joyal. 2006. Efficient approximations for the arctangent function. IEEE Signal Processing Magazine 23, 3 (May 2006), 108--111.Google ScholarGoogle ScholarCross RefCross Ref
  29. R. B. Rusu, N. Blodow, and M. Beetz. 2009. Fast point feature histograms (FPFH) for 3D registration. In Proceedings of the International Conference on Robotics and Automation (ICRA). Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. R. B. Rusu and S. Cousins. 2011. 3D is here: Point cloud library (PCL). In Proceedings of the International Conference on Robotics and Automation (ICRA). DOI:http://dx.doi.org/10.1109/ICRA.2011.5980567Google ScholarGoogle Scholar
  31. S. Safari, A. Fijany, F. Diotalevi, and F. Hosseini. 2012. Highly parallel and fast implementation of stereo vision algorithms on MIMD many-core Tilera architecture. In 2012 IEEE Aerospace Conference. 1--11. DOI:http://dx.doi.org/10.1109/AERO.2012.6187228Google ScholarGoogle ScholarCross RefCross Ref
  32. Y. Sato, T. Sugimura, H. Noda, Y. Okuno, K. Arimoto, and T. Nagasaki. 2009. Integral-image based implementation of U-SURF algorithm for embedded super parallel processor. In International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS’09). 485--488. DOI:http://dx.doi.org/10.1109/ISPACS.2009.5383795Google ScholarGoogle Scholar
  33. M. Schaeferling, U. Hornung, and G. Kiefer. 2012. Object recognition and pose estimation on embedded hardware: Surf-based system designs accelerated by FPGA logic. International Journal of Reconfigurable Computing 2012, Article 6 (Jan. 2012), 1 page. DOI:http://dx.doi.org/10.1155/2012/368351 Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. C.-L. Su, P.-Y. Chen, C.-C. Lan, L.-S. Huang, and K.-H. Wu. 2012. Overview and comparison of OpenCL and CUDA technology for GPGPU. In 2012 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS). 448--451. DOI:http://dx.doi.org/10.1109/APCCAS.2012.6419068Google ScholarGoogle ScholarCross RefCross Ref
  35. H.-N. Ta and S. Lee. 2011. High-performance computing model for 3D camera system. In 2011 IEEE International Conference on Robotics and Biomimetics (ROBIO). 354--359. DOI:http://dx.doi.org/10.1109/ROBIO.2011.6181311Google ScholarGoogle ScholarCross RefCross Ref
  36. D. C. C. Tam and M. Fiala. 2012. A real time augmented reality system using GPU acceleration. In 2012 9th Conference on Computer and Robot Vision (CRV). 101--108. DOI:http://dx.doi.org/10.1109/CRV.2012.21 Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. F. Tombari, S. Salti, and L. Di Stefano. 2010. Unique signatures of histograms for local surface description. In 11th European Conference on Computer Vision Conference on Computer Vision: Part III (ECCV’10). Springer-Verlag, Berlin, 356--369. http://dl.acm.org/citation.cfm?id=1927006.1927035 Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. V. Vineet and P. J. Narayanan. 2008. CUDA cuts: Fast graph cuts on the GPU. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’08). 1--8. DOI:http://dx.doi.org/10.1109/CVPRW.2008.4563095Google ScholarGoogle Scholar
  39. S. Williams, A. Waterman, and D. Patterson. 2009. Roofline: An insightful visual performance model for multicore architectures. Communications of the ACM 52, 4 (Apr. 2009), 65--76. DOI:http://dx.doi.org/10.1145/1498765.1498785 Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. H. Xiao, W. He, K. Yuan, and F. Wen. 2013. Real-time scene recognition on embedded system with SIFT keypoints and a new descriptor. In 2013 IEEE International Conference on Mechatronics and Automation (ICMA). 1317--1324. DOI:http://dx.doi.org/10.1109/ICMA.2013.6618104Google ScholarGoogle Scholar
  41. K. Zhang, J. Lu, G. Lafruit, R. Lauwereins, and L. Van Gool. 2009. Real-time accurate stereo with bitwise fast voting on CUDA. In 2009 IEEE 12th International Conference on Computer Vision Workshops (ICCV Workshops). 794--800. DOI:http://dx.doi.org/10.1109/ICCVW.2009.5457623Google ScholarGoogle ScholarCross RefCross Ref
  42. Y. Zhong. 2009. Intrinsic shape signatures: A shape descriptor for 3D object recognition. In 2009 IEEE 12th International Conference on Computer Vision Workshops (ICCV Workshops). 689--696. DOI:http://dx.doi.org/10.1109/ICCVW.2009.5457637Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. 3D CV Descriptor on Parallel Heterogeneous Platforms

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Article Metrics

        • Downloads (Last 12 months)2
        • Downloads (Last 6 weeks)0

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!