skip to main content
research-article

Region-based memory management for GPU programming languages: enabling rich data structures on a spartan host

Published:15 October 2014Publication History
Skip Abstract Section

Abstract

Graphics processing units (GPUs) can effectively accelerate many applications, but their applicability has been largely limited to problems whose solutions can be expressed neatly in terms of linear algebra. Indeed, most GPU programming languages limit the user to simple data structures - typically only multidimensional rectangular arrays of scalar values. Many algorithms are more naturally expressed using higher level language features, such as algebraic data types (ADTs) and first class procedures, yet building these structures in a manner suitable for a GPU remains a challenge. We present a region-based memory management approach that enables rich data structures in Harlan, a language for data parallel computing. Regions enable rich data structures by providing a uniform representation for pointers on both the CPU and GPU and by providing a means of transferring entire data structures between CPU and GPU memory. We demonstrate Harlan's increased expressiveness on several example programs and show that Harlan performs well on more traditional data-parallel problems.

Skip Supplemental Material Section

Supplemental Material

References

  1. Bauer, M., Treichler, S., Slaughter, E., Aiken, A.: Legion: Expressing locality and independence with logical regions. In: 2012 International Conference for High Performance Computing, Networking, Storage and Analysis (SC) (2012). Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Blelloch, G. E., Chatterjee, S., Hardwick, J. C., Sipelstein, J., Zagha, M.: Implementation of a portable nested dataparallel language. Journal of Parallel and Distributed Computing 21(1), 4--14 (Apr 1994). Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bocchino, Jr., R. L., Adve, V. S., Adve, S. V., Snir, M.: Parallel programming must be deterministic by default. In: Proceedings of the First USENIX conference on Hot topics in parallelism. USENIX Association (2009). Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Catanzaro, B.C., Garland, M., Keutzer, K.: Copperhead: compiling an embedded data parallel language. In: PPOPP. pp. 47--56 (2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Chafi, H., Sujeeth, A. K., Brown, K. J., Lee, H., Atreya, A. R., Olukotun, K.: A domain-specific approach to heterogeneous parallelism. In: Proceedings of the 16th ACM symposium on Principles and practice of parallel programming. ACM (2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Chakravarty, M. M., Keller, G., Lee, S., McDonell, T. L., Grover, V.: Accelerating Haskell array codes with multicore GPUs. In: Proceedings of the sixth workshop on Declarative aspects of multicore programming. pp. 3--14. DAMP '11, ACM, New York, NY, USA (2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Chakravarty, M., Keller, G., Lee, S., McDonell, T., Grover, V.: Accelerating Haskell array codes with multicore GPUs. In: Proceedings of the sixth workshop on Declarative aspects of multicore programming. pp. 3--14. ACM (2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Collins, A., Grewe, D., Grover, V., Lee, S., Susnea, A.: NOVA: A functional language for data parallelism. Tech. Rep. NVR-2013-001, NVIDIA (July 2013).Google ScholarGoogle Scholar
  9. Cooper, K. D., Torczon, L.: Engineering a Compiler. Elsevier Science (October 2003).Google ScholarGoogle Scholar
  10. Crary, K., Weirich, S., Morrisett, G.: Intensional polymorphism in type-erasure semantics. In: Proceedings of the third ACM SIGPLAN international conference on Functional programming. ACM (1998). Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Cunningham, D., Bordawekar, R., Saraswat, V.: Gpu programming in a high level language: Compiling x10 to cuda. In: Proceedings of the 2011 ACM SIGPLAN X10 Workshop. pp. 8:1--8:10. X10 '11, ACM, New York, NY, USA (2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Gal, A., Eich, B., Shaver, M., Anderson, D., Mandelin, D., Haghighat, M. R., Kaplan, B., Hoare, G., Zbarsky, B., Orendorff, J., Ruderman, J., Smith, E. W., Reitmaier, R., Bebenita, M., Chang, M., Franz, M.: Trace-based just-in-time type specialization for dynamic languages. In: Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM (2009). Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Grossman, D., Morrisett, G., Jim, T., Hicks, M.,Wang, Y., Cheney, J.: Region-based memory management in Cyclone. In: Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation. ACM (2002). Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Holk, E., Byrd, W., Mahajan, N., Willcock, J., Chauhan, A., Lumsdaine, A.: Declarative parallel programming for GPUs. In: Proceedings of the International Conference on Parallel Computing (ParCo) (Sep 2011).Google ScholarGoogle Scholar
  15. Holk, E., Pathirage, M., Chauhan, A., Lumsdaine, A., Matsakis, N. D.: GPU programming in Rust: Implementing high-level abstractions in a systems-level language. In: Proceedings of the 18th International Workshop on High-Level Parallel Programming Models and Supportive Environments (May 2013). Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Jablin, T. B., Prabhu, P., Jablin, J. A., Johnson, N. P., Beard, S. R., August, D. I.: Automatic cpu-gpu communication management and optimization. In: Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation. ACM (2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Ji, F., Lin, H., Ma, X.: Rsvm: A region-based software virtual memory for gpu. In: Proceedings of the 22Nd International Conference on Parallel Architectures and Compilation Techniques. pp. 269--278. PACT '13, IEEE Press, Piscataway, NJ, USA (2013). Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Johnsson, T.: Lambda lifting: Transforming programs to recursive equations. In: Functional programming languages and computer architecture. pp. 190--203. Springer (1985). Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Khronos OpenCLWorking Group: The OpenCL Specification (Nov 2012).Google ScholarGoogle Scholar
  20. NVIDIA: CUDA C Programming Guide (Oct 2012).Google ScholarGoogle Scholar
  21. Parker, S. G., Bigler, J., Dietrich, A., Friedrich, H., Hoberock, J., Luebke, D., McAllister, D., McGuire, M., Morley, K., Robison, A., Stich, M.: OptiX: a general purpose ray tracing engine. In: ACM SIGGRAPH 2010 papers. ACM (2010). Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Prabhu, T., Ramalingam, S., Might, M., Hall, M.: EigenCFA: accelerating flow analysis with GPUs. In: Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages. ACM (2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Reynolds, J. C.: Definitional interpreters for higher-order programming languages. In: Proceedings of the ACM Annual Conference - Volume 2. ACM (1972). Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. The Rust programming language. http://www.rust-lang.org/Google ScholarGoogle Scholar
  25. Ryoo, S., Rodrigues, C. I., Baghsorkhi, S. S., Stone, S. S., Kirk, D. B., Hwu, W.m. W.: Optimization principles and application performance evaluation of a multithreaded gpu using cuda. In: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming. ACM (2008). Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Sarkar, D.,Waddell, O., Dybvig, R. K.: A nanopass infrastructure for compiler education. In: Proceedings of the ninth ACM SIGPLAN international conference on Functional programming. ACM (2004). Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Tarjan, R.: Depth-first search and linear graph algorithms. SIAM Journal on Computing 1(2), 146--160 (1972).Google ScholarGoogle ScholarCross RefCross Ref
  28. Tofte, M., Talpin, J. P.: Region-based memory management. Information and Computation 132(2) (1997). Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Yang, K., He, B., Luo, Q., Sander, P. V., Shi, J.: Stack-based parallel recursion on graphics processors. In: Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming. ACM (2009). Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Region-based memory management for GPU programming languages: enabling rich data structures on a spartan host

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader
            About Cookies On This Site

            We use cookies to ensure that we give you the best experience on our website.

            Learn more

            Got it!