skip to main content
tutorial

Heterogeneous Managed Runtime Systems: A Computer Vision Case Study

Authors Info & Claims
Published:08 April 2017Publication History
Skip Abstract Section

Abstract

Real-time 3D space understanding is becoming prevalent across a wide range of applications and hardware platforms. To meet the desired Quality of Service (QoS), computer vision applications tend to be heavily parallelized and exploit any available hardware accelerators. Current approaches to achieving real-time computer vision, evolve around programming languages typically associated with High Performance Computing along with binding extensions for OpenCL or CUDA execution.

Such implementations, although high performing, lack portability across the wide range of diverse hardware resources and accelerators. In this paper, we showcase how a complex computer vision application can be implemented within a managed runtime system. We discuss the complexities of achieving high-performing and portable execution across embedded and desktop configurations. Furthermore, we demonstrate that it is possible to achieve the QoS target of over 30 frames per second (FPS) by exploiting FPGA and GPGPU acceleration transparently through the managed runtime system.

References

  1. B. Alpern, C. R. Attanasio, J. J. Barton, M. G. Burke, P. Cheng, J.-D. Choi, A. Cocchi, S.J. Fink, D. Grove, M. Hind, S. F. Hummel, D. Lieber, V. Litvinov, M. F. Mergen, T. Ngo, J. R. Russell, V. Sarkar, M. J. Serrano, J. C. Shepherd, S. E. Smith, V. C. Sreedhar, H. Srinivasan, and J. Whaley. 2000. The JalapeñO Virtual Machine. IBM Systems Journal (2000).Google ScholarGoogle Scholar
  2. AMD-Aparapi. 2017. http://developer.amd.com/tools-and-sdks/heterogeneous-computing/aparapi/. (Feb. 2017).Google ScholarGoogle Scholar
  3. Arvind. 2003. Bluespec: A Language for Hardware Design, Simulation, Synthesis and Verification Invited Talk. In Proceedings of the First ACM and IEEE International Conference on Formal Methods and Models for Co-Design (MEMOCODE '03). IEEE Computer Society, Washington, DC, USA, 249--. http://dl.acm.org/citation.cfm?id=823453.823860Google ScholarGoogle Scholar
  4. Joshua Auerbach, David F. Bacon, Perry Cheng, and Rodric Rabbah. 2010. Lime: A Java-compatible and Synthesizable Language for Heterogeneous Architectures. In Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA '10). ACM, New York, NY, USA, 89--108. DOI: http://dx.doi.org/10.1145/1869459.1869469 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. James Bergstra, Olivier Breuleux, Frédéric Bastien, Pascal Lamblin, Razvan Pascanu, Guillaume Desjardins, Joseph Turian, David Warde-Farley, and Yoshua Bengio. 2010. Theano: a CPU and GPU Math Expression Compiler. In Proceedings of the Python for Scientific Computing Conference (SciPy).Google ScholarGoogle ScholarCross RefCross Ref
  6. P. J. Besl and H. D. McKay. 1992. A method for registration of 3-D shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 14, 2 (Feb 1992), 239--256. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. M. Blackburn, R. Garner, C. Hoffman, A. M. Khan, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanović, T. VanDrunen, D. von Dincklage, and B. Wiedermann. 2006. The DaCapo Benchmarks: Java Benchmarking Development and Analysis. In OOPSLA '06: Proceedings of the 21st annual ACM SIGPLAN conference on Object-Oriented Programing, Systems, Languages, and Applications. ACM Press.Google ScholarGoogle Scholar
  8. J. Butzke, K. Daniilidis, A. Kushleyev, D. D. Lee, M. Likhachev, C. Phillips, and M. Phillips. 2012. The University of Pennsylvania MAGIC 2010 multi-robot unmanned vehicle system. Journal of Field Robotics 29, 5 (2012), 745--761. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Bryan Catanzaro, Michael Garland, and Kurt Keutzer. 2011. Copperhead: Compiling an Embedded Data Parallel Language. In Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming (PPoPP '11). ACM, New York, NY, USA, 47--56. DOI: http://dx.doi.org/10.1145/1941553.1941562 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Olivier Chafik. 2017. ScalaCL: Faster Scala: optimizing compiler plugin + GPU-based collections (OpenCL). (Feb. 2017). Retrieved March 11, 2017 from http://code.google.com/p/scalaclGoogle ScholarGoogle Scholar
  11. Manuel M.T. Chakravarty, Gabriele Keller, Sean Lee, Trevor L. McDonell, and Vinod Grover. 2011. Accelerating Haskell Array Codes with Multicore GPUs. In Proceedings of the Sixth Workshop on Declarative Aspects of Multicore Programming (DAMP '11). ACM, New York, NY, USA, 3--14. DOI:http://dx.doi.org/10.1145/1926354.1926358 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. James Clarkson, Christos Kotselidis, Gavin Brown, and Mikel Luján. 2017. Boosting Java Performance using GPGPUs. In Proceedings of the 30th International Conference on Architecture of Computing Systems (ARCS '17). Google ScholarGoogle ScholarCross RefCross Ref
  13. Ronan Collobert, Koray Kavukcuoglu, and Clément Farabet. 2011. Torch7: A Matlab-like Environment for Machine Learning. In BigLearn, NIPS Workshop.Google ScholarGoogle Scholar
  14. Georg Dotzler, Ronald Veldema, and Michael Klemm. 2010. JCudaMP. In Proceedings of the 3rd International Workshop on Multicore Software Engineering. DOI:http://dx.doi.org/10.1145/1808954.1808959 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. EJML. 2017. (Feb. 2017). Retrieved March 11, 2017 from http://ejml.orgGoogle ScholarGoogle Scholar
  16. Juan José Fumero, Michel Steuwer, and Christophe Dubach. 2014. A Composable Array Function Interface for Heterogeneous Computing in Java. In Proceedings of ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming (ARRAY'14). ACM, New York, NY, USA, 44:44--44:49. DOI:http://dx.doi.org/10.1145/2627373.2627381 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. Handa, T. Whelan, J.B. McDonald, and A.J. Davison. 2014. A Benchmark for RGB-D Visual Odometry, 3D Reconstruction and SLAM. In ICRA.Google ScholarGoogle Scholar
  18. A. Handa, T. Whelan, J.B. McDonald, and A.J. Davison. 2014. A Benchmark for RGB-D Visual Odometry, 3D Reconstruction and SLAM. In IEEE Intl. Conf. on Robotics and Automation, ICRA. Hong Kong, China.Google ScholarGoogle Scholar
  19. Akihiro Hayashi, Max Grossman, Jisheng Zhao, Jun Shirako, and Vivek Sarkar. 2013. Accelerating Habanero-Java Programs with OpenCL Generation. In Proceedings of the 2013 International Conference on Principles and Practices of Programming on the Java Platform: Virtual Machines, Languages, and Tools. DOI:http://dx.doi.org/10.1145/2500828.2500840 Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Sylvain Henry. 2013. ViperVM: A Runtime System for Parallel Functional High-performance Computing on Heterogeneous Architectures. In Proceedings of the 2Nd ACM SIGPLAN Workshop on Functional High-performance Computing (FHPC '13). ACM, New York, NY, USA, 3--12. DOI:http://dx.doi.org/10.1145/2502323.2502329 Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Stephan Herhut, Richard L. Hudson, Tatiana Shpeisman, and Jaswanth Sreeram. 2013. River Trail: A Path to Parallelism in JavaScript. In Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Applications (OOPSLA '13). ACM, New York, NY, USA, 729--744. DOI:http://dx.doi.org/10.1145/2509136.2509516 Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. JEP 243: Java-Level JVM Compiler Interface. 2017. http://openjdk.java.net/jeps/243. (Feb. 2017).Google ScholarGoogle Scholar
  23. Java bindings for OpenCL. 2017. (Feb. 2017). Retrieved March 11, 2017 from http://www.jocl.org/Google ScholarGoogle Scholar
  24. Andreas Klöckner, Nicolas Pinto, Yunsup Lee, Bryan Catanzaro, Paul Ivanov, and Ahmed Fasih. 2012. PyCUDA and PyOpenCL: A Scripting-based Approach to GPU Run-time Code Generation. Parallel Comput. 38, 3 (March 2012), 157--174. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Christos Kotselidis, Andrey Rodchenko, Colin Barrett, Andy Nisbet, John Mawer, Will Toms, James Clarksonand Cosmin Gorgovan, Amanieu d'Antras, Yaman Cakmakci, Thanos Stratikopoulos, Sebatian Werner, Jim Garside, Javier Navaridas, Antoniu Pop, John Goodacre, and Mikel Luján. 2016. Project Beehive: A Hardware/Software Co-designed Stack for Runtime and Architectural Research. In Proceedings of the 9th International Workshop on Programmability and Architectures for Heterogeneous Multicores (MULTIPROG '16).Google ScholarGoogle Scholar
  26. Geoffrey Mainland and Greg Morrisett. 2010. Nikola: Embedding Compiled GPU Functions in Haskell. In Proceedings of the Third ACM Haskell Symposium on Haskell (Haskell '10). ACM, New York, NY, USA, 67--78. DOI:http://dx.doi.org/10.1145/1863523.1863533 Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Luigi Nardi, Bruno Bodin, M. Zeeshan Zia, John Mawer, Andy Nisbet, Paul H.J. Kelly, Andrew J. Davison, Mikel Luján, Michael F. P. O'Boyle, Graham Riley, Nigel Topham, and Steve Furber. 2015.. Introducing SLAMBench, a performance and accuracy benchmarking methodology for SLAM. In ICRA. Google ScholarGoogle ScholarCross RefCross Ref
  28. Richard A. Newcombe, Shahram Izadi, Otmar Hilliges, David Molyneaux, David Kim, Andrew J. Davison, Pushmeet Kohli, Jamie Shotton, Steve Hodges, and Andrew Fitzgibbon. 2011. KinectFusion: Real-time Dense Surface Mapping and Tracking. In Proceedings of the 2011 10th IEEE International Symposium on Mixed and Augmented Reality (ISMAR '11). IEEE Computer Society, Washington, DC, USA, 127--136. DOI:http://dx.doi.org/10.1109/ISMAR.2011.6092378 Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Nathaniel Nystrom, Derek White, and Kishen Das. 2011. Firepile: Runtime Compilation for GPUs in Scala. In Proceedings of the 10th ACM International Conference on Generative Programming and Component Engineering (GPCE '11). ACM, New York, NY, USA, 107--116. DOI: http://dx.doi.org/10.1145/2047862.2047883 Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. OpenJDK. 2017. http://openjdk.java.net/. (Feb. 2017).Google ScholarGoogle Scholar
  31. P.C. Pratt-Szeliga, J.W. Fawcett, and R.D. Welch. 2012. Rootbeer: Seamlessly Using GPUs from Java. In Proceedings of 14th International IEEE High Performance Computing and Communication Conference on Embedded Software and Systems. DOI:http://dx.doi.org/10.1109/HPCC.2012.57 Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Alex Rubinsteyn, Eric Hielscher, Nathaniel Weinman, and Dennis Shasha. 2012. Parakeet: A Just-in-time Parallel Accelerator for Python. In Proceedings of the 4th USENIX Conference on Hot Topics in Parallelism (HotPar'12). USENIX Association, Berkeley, CA, USA, 14--14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. SpecJVM2008. 2017. https://www.spec.org/jvm2008/. (Feb. 2017).Google ScholarGoogle Scholar
  34. Lukas Stadler, Thomas Würthinger, and Hanspeter Mössenböck. 2014. Partial Escape Analysis and Scalar Replacement for Java. In Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO '14). ACM, New York, NY, USA, 165:165--165:174. DOI:http://dx.doi.org/10.1145/2544137.2544157Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Tango. 2017. (Feb. 2017). Retrieved March 11, 2017 from https://get.google.com/tango/Google ScholarGoogle Scholar
  36. Christian Wimmer, Michael Haupt, Michael L. Van De Vanter, Mick Jordan, Laurent Daynès, and Douglas Simon. 2013. Maxine: An Approachable Virtual Machine for, and in, Java. ACM Trans. Archit. Code Optim. (January 2013).Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Yonghong Yan, Max Grossman, and Vivek Sarkar. 2009. JCUDA: A Programmer-Friendly Interface for Accelerating Java Programs with CUDA. In Euro-Par 2009 Parallel Processing, Henk Sips, Dick Epema, and Hai-Xiang Lin (Eds.), Vol. 5704. Springer Berlin Heidelberg. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Wojciech Zaremba, Yuan Lin, and Vinod Grover. 2012. JaBEE: Framework for Object-oriented Java Bytecode Compilation and Execution on Graphics Processor Units. In Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units (GPGPU-5). ACM, New York, NY, USA, 74--83. DOI:http://dx.doi.org/10.1145/2159430.2159439 Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Zhengyou Zhang. 1994. Iterative Point Matching for Registration of Free-form Curves and Surfaces. Int. J. Comput. Vision 13, 2 (Oct. 1994), 119--152. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader
About Cookies On This Site

We use cookies to ensure that we give you the best experience on our website.

Learn more

Got it!