skip to main content
research-article

GMAI: Understanding and Exploiting the Internals of GPU Resource Allocation in Critical Systems

Published:26 September 2020Publication History
Skip Abstract Section

Abstract

Critical real-time systems require strict resource provisioning in terms of memory and timing. The constant need for higher performance in these systems has led industry to recently include GPUs. However, GPU software ecosystems are by their nature closed source, forcing system engineers to consider them as black boxes, complicating resource provisioning. In this work, we reverse engineer the internal operations of the GPU system software to increase the understanding of their observed behaviour and how resources are internally managed. We present our methodology that is incorporated in GMAI (GPU Memory Allocation Inspector), a tool that allows system engineers to accurately determine the exact amount of resources required by their critical systems, avoiding underprovisioning. We first apply our methodology on a wide range of GPU hardware from different vendors showing its generality in obtaining the properties of the GPU memory allocators. Next, we demonstrate the benefits of such knowledge in resource provisioning of two case studies from the automotive domain, where the actual memory consumption is up to 5.6× more than the memory requested by the application.

References

  1. T. Amert, N. Otterness, M. Yang, J. H. Anderson, and F. Donelson Smith. 2018. GPU scheduling on the NVIDIA TX2: Hidden details revealed. In Proceedings of the Real-time Systems Symposium, Vol. 2018-January.Google ScholarGoogle Scholar
  2. ARINC. 2010. Avionics Application Software Standard Interface: ARINC Specification 653P1-3. Aeronautical Radio. Retrieved from https://www.aviation-ia.com/product-categories/600-series.Google ScholarGoogle Scholar
  3. AUTOSAR. 2019. AUTOSAR. Retrieved on April 2019 from https://www.autosar.org.Google ScholarGoogle Scholar
  4. E. D. Berger, K. S. McKinley, R. D. Blumofe, and P. R. Wilson. 2000. Hoard: A scalable memory allocator for multithreaded applications. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’00). 117--128.Google ScholarGoogle Scholar
  5. A. J. Calderón, L. Kosmidis, C. F. Nicolás, F. J. Cazorla, and P. Onaindia. 2019. GMAI: GPU Memory Allocation Inspector. Retrieved from https://github.com/ajcalderont/gmai.Google ScholarGoogle Scholar
  6. X. Chen, A. Slowinska, and H. Bos. 2013. Who allocated my memory? Detecting custom memory allocators in C binaries. In Proceedings of the Working Conference on Reverse Engineering (WCRE’13). 22--31.Google ScholarGoogle Scholar
  7. R. L. Davidson and C. P. Bridges. 2018. Error resilient GPU accelerated image processing for space applications. IEEE Trans. Parallel Distrib. Syst. 29, 9 (2018), 1990--2003.Google ScholarGoogle ScholarCross RefCross Ref
  8. Free Software Foundation. 2019. The GNU Allocator. Retrieved from https://www.gnu.org/software/libc/manual/html_node/The-GNU-Allocator.html.Google ScholarGoogle Scholar
  9. Green Hills Software. 1996. Integrity RTOS. Retrieved from https://www.ghs.com/products/rtos/integrity.html.Google ScholarGoogle Scholar
  10. Y. Hasan and J. M. Chang. 2006. A tunable hybrid memory allocator. J. Syst. Softw. 79, 8 (2006), 1051--1063.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. X. Huang, C. I. Rodrigues, S. Jones, I. Buck, and W. Hwu. 2010. XMalloc: A scalable lock-free dynamic memory allocator for many-core machines. In Proceedings of the 10th IEEE International Conference on Computer and Information Technology (CIT’10) and 7th IEEE International Conference on Embedded Software and Systems (ICESS’10) (ScalCom’10). 1134--1139.Google ScholarGoogle Scholar
  12. X. Huang, C. I. Rodrigues, S. Jones, I. Buck, and W.-m. Hwu. 2013. Scalable SIMD-parallel memory allocation for many-core machines. J. Supercomput. 64, 3 (2013), 1008--1020.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Intel Corporation. 2014. Getting the Most from OpenCL 1.2: How to Increase Performance by Minimizing Buffer Copies on Intel Processor Graphics. Retrieved on October 2019 from https://software.intel.com/en-us/articles/getting-the-most-from-opencl-12-how-to-increase-performance-by-minimizing-buffer-copies-on-intel-processor-graphics.Google ScholarGoogle Scholar
  14. L. Kosmidis, C. Maxim, V. Jegu, F. Vatrinet, and F. J. Cazorla. 2018. Industrial experiences with resource management under software randomization in ARINC653 avionics environments. In Proceedings of the IEEE/ACM International Conference on Computer-aided Design, Digest of Technical Papers (ICCAD’18).Google ScholarGoogle Scholar
  15. X. Mei and X. Chu. 2017. Dissecting GPU memory hierarchy through microbenchmarking. IEEE Trans. Parallel Distrib. Syst. 28, 1 (2017), 72--86.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. NVIDIA Corporation. 2019. Self Driving Cars. Retrieved on April 2019 from https://www.nvidia.com/en-us/self-driving-cars.Google ScholarGoogle Scholar
  17. U. Ozgunalp. 2018. Combination of the symmetrical local threshold and the sobel edge detector for lane feature extraction. In Proceedings of the 9th International Conference on Computational Intelligence and Communication Networks (CICN’17), Vol. 2018-January. 24--28.Google ScholarGoogle Scholar
  18. V. Shah and A. Shah. 2019. Proposed Memory Allocation Algorithm for NUMA-based Soft Real-Time Operating System. Advances in Intelligent Systems and Computing, Vol. 814. 3–11.Google ScholarGoogle ScholarCross RefCross Ref
  19. A. Slowinska, T. Stancescu, and H. Bos. 2011. Howard: A dynamic excavator for reverse engineering data structures. In Proceedings of the 18th Annual Network and Distributed System Security Symposium (NDSS’11).Google ScholarGoogle Scholar
  20. M. Steinberger, M. Kenzel, B. Kainz, and D. Schmalstieg. 2012. ScatterAlloc: Massively parallel dynamic memory allocation for the GPU. In Proceedings of the Innovative Parallel Computing Conference (InPar’12).Google ScholarGoogle Scholar
  21. M. M. Trompouki, L. Kosmidis, and N. Navarro. 2017. An open benchmark implementation for multi-CPU multi-GPU pedestrian detection in automotive systems. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers (ICCAD’17), Vol. 2017-November. 305--312.Google ScholarGoogle Scholar
  22. H. Vishwanathan, D. L. Peters, and J. Z. Zhang. 2017. Traffic sign recognition in autonomous vehicles using edge detection. In Proceedings of the ASME Dynamic Systems and Control Conference (DSCC’17), Vol. 1.Google ScholarGoogle Scholar
  23. S. Widmer, D. Wodniok, N. Weber, and M. Goesele. 2013. Fast dynamic memory allocator for massively parallel architectures. In Proceedings of the 6th ACM Workshop on General Purpose Processor Using Graphics Processing Units. 120--126.Google ScholarGoogle Scholar
  24. P. R. Wilson, M. S. Johnstone, M. Neely, and D. Boles. 1995. Dynamic Storage Allocation: A Survey and Critical Review. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 986. 1–116.Google ScholarGoogle Scholar
  25. H. Wong, M. Papadopoulou, M. Sadooghi-Alvandi, and A. Moshovos. 2010. Demystifying GPU microarchitecture through microbenchmarking. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS’10). 235--246.Google ScholarGoogle Scholar
  26. M. Yang, N. Otterness, T. Amert, J. Bakita, J. H. Anderson, and F. D. Smith. 2018. Avoiding pitfalls when using NVIDIA GPUs for real-time tasks in autonomous systems. In Leibniz International Proceedings in Informatics, LIPIcs, Vol. 106.Google ScholarGoogle Scholar
  27. R. Younis and N. Bastaki. 2018. Accelerated fog removal from real images for car detection. In Proceedings of the 9th IEEE-GCC Conference and Exhibition (GCCCE’17).Google ScholarGoogle Scholar
  28. X. Yu, H. Wang, W. Feng, H. Gong, and G. Cao. 2019. GPU-based iterative medical CT image reconstructions. J. Sig. Proc. Syst. 91, 3--4 (2019), 321--338.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. GMAI: Understanding and Exploiting the Internals of GPU Resource Allocation in Critical Systems

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in

                Full Access

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader

                HTML Format

                View this article in HTML Format .

                View HTML Format
                About Cookies On This Site

                We use cookies to ensure that we give you the best experience on our website.

                Learn more

                Got it!