Abstract
Graphics processing units (GPUs) can effectively accelerate many applications, but their applicability has been largely limited to problems whose solutions can be expressed neatly in terms of linear algebra. Indeed, most GPU programming languages limit the user to simple data structures - typically only multidimensional rectangular arrays of scalar values. Many algorithms are more naturally expressed using higher level language features, such as algebraic data types (ADTs) and first class procedures, yet building these structures in a manner suitable for a GPU remains a challenge. We present a region-based memory management approach that enables rich data structures in Harlan, a language for data parallel computing. Regions enable rich data structures by providing a uniform representation for pointers on both the CPU and GPU and by providing a means of transferring entire data structures between CPU and GPU memory. We demonstrate Harlan's increased expressiveness on several example programs and show that Harlan performs well on more traditional data-parallel problems.
Supplemental Material
Available for Download
- Bauer, M., Treichler, S., Slaughter, E., Aiken, A.: Legion: Expressing locality and independence with logical regions. In: 2012 International Conference for High Performance Computing, Networking, Storage and Analysis (SC) (2012). Google Scholar
Digital Library
- Blelloch, G. E., Chatterjee, S., Hardwick, J. C., Sipelstein, J., Zagha, M.: Implementation of a portable nested dataparallel language. Journal of Parallel and Distributed Computing 21(1), 4--14 (Apr 1994). Google Scholar
Digital Library
- Bocchino, Jr., R. L., Adve, V. S., Adve, S. V., Snir, M.: Parallel programming must be deterministic by default. In: Proceedings of the First USENIX conference on Hot topics in parallelism. USENIX Association (2009). Google Scholar
Digital Library
- Catanzaro, B.C., Garland, M., Keutzer, K.: Copperhead: compiling an embedded data parallel language. In: PPOPP. pp. 47--56 (2011). Google Scholar
Digital Library
- Chafi, H., Sujeeth, A. K., Brown, K. J., Lee, H., Atreya, A. R., Olukotun, K.: A domain-specific approach to heterogeneous parallelism. In: Proceedings of the 16th ACM symposium on Principles and practice of parallel programming. ACM (2011). Google Scholar
Digital Library
- Chakravarty, M. M., Keller, G., Lee, S., McDonell, T. L., Grover, V.: Accelerating Haskell array codes with multicore GPUs. In: Proceedings of the sixth workshop on Declarative aspects of multicore programming. pp. 3--14. DAMP '11, ACM, New York, NY, USA (2011). Google Scholar
Digital Library
- Chakravarty, M., Keller, G., Lee, S., McDonell, T., Grover, V.: Accelerating Haskell array codes with multicore GPUs. In: Proceedings of the sixth workshop on Declarative aspects of multicore programming. pp. 3--14. ACM (2011). Google Scholar
Digital Library
- Collins, A., Grewe, D., Grover, V., Lee, S., Susnea, A.: NOVA: A functional language for data parallelism. Tech. Rep. NVR-2013-001, NVIDIA (July 2013).Google Scholar
- Cooper, K. D., Torczon, L.: Engineering a Compiler. Elsevier Science (October 2003).Google Scholar
- Crary, K., Weirich, S., Morrisett, G.: Intensional polymorphism in type-erasure semantics. In: Proceedings of the third ACM SIGPLAN international conference on Functional programming. ACM (1998). Google Scholar
Digital Library
- Cunningham, D., Bordawekar, R., Saraswat, V.: Gpu programming in a high level language: Compiling x10 to cuda. In: Proceedings of the 2011 ACM SIGPLAN X10 Workshop. pp. 8:1--8:10. X10 '11, ACM, New York, NY, USA (2011). Google Scholar
Digital Library
- Gal, A., Eich, B., Shaver, M., Anderson, D., Mandelin, D., Haghighat, M. R., Kaplan, B., Hoare, G., Zbarsky, B., Orendorff, J., Ruderman, J., Smith, E. W., Reitmaier, R., Bebenita, M., Chang, M., Franz, M.: Trace-based just-in-time type specialization for dynamic languages. In: Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM (2009). Google Scholar
Digital Library
- Grossman, D., Morrisett, G., Jim, T., Hicks, M.,Wang, Y., Cheney, J.: Region-based memory management in Cyclone. In: Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation. ACM (2002). Google Scholar
Digital Library
- Holk, E., Byrd, W., Mahajan, N., Willcock, J., Chauhan, A., Lumsdaine, A.: Declarative parallel programming for GPUs. In: Proceedings of the International Conference on Parallel Computing (ParCo) (Sep 2011).Google Scholar
- Holk, E., Pathirage, M., Chauhan, A., Lumsdaine, A., Matsakis, N. D.: GPU programming in Rust: Implementing high-level abstractions in a systems-level language. In: Proceedings of the 18th International Workshop on High-Level Parallel Programming Models and Supportive Environments (May 2013). Google Scholar
Digital Library
- Jablin, T. B., Prabhu, P., Jablin, J. A., Johnson, N. P., Beard, S. R., August, D. I.: Automatic cpu-gpu communication management and optimization. In: Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation. ACM (2011). Google Scholar
Digital Library
- Ji, F., Lin, H., Ma, X.: Rsvm: A region-based software virtual memory for gpu. In: Proceedings of the 22Nd International Conference on Parallel Architectures and Compilation Techniques. pp. 269--278. PACT '13, IEEE Press, Piscataway, NJ, USA (2013). Google Scholar
Digital Library
- Johnsson, T.: Lambda lifting: Transforming programs to recursive equations. In: Functional programming languages and computer architecture. pp. 190--203. Springer (1985). Google Scholar
Digital Library
- Khronos OpenCLWorking Group: The OpenCL Specification (Nov 2012).Google Scholar
- NVIDIA: CUDA C Programming Guide (Oct 2012).Google Scholar
- Parker, S. G., Bigler, J., Dietrich, A., Friedrich, H., Hoberock, J., Luebke, D., McAllister, D., McGuire, M., Morley, K., Robison, A., Stich, M.: OptiX: a general purpose ray tracing engine. In: ACM SIGGRAPH 2010 papers. ACM (2010). Google Scholar
Digital Library
- Prabhu, T., Ramalingam, S., Might, M., Hall, M.: EigenCFA: accelerating flow analysis with GPUs. In: Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages. ACM (2011). Google Scholar
Digital Library
- Reynolds, J. C.: Definitional interpreters for higher-order programming languages. In: Proceedings of the ACM Annual Conference - Volume 2. ACM (1972). Google Scholar
Digital Library
- The Rust programming language. http://www.rust-lang.org/Google Scholar
- Ryoo, S., Rodrigues, C. I., Baghsorkhi, S. S., Stone, S. S., Kirk, D. B., Hwu, W.m. W.: Optimization principles and application performance evaluation of a multithreaded gpu using cuda. In: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming. ACM (2008). Google Scholar
Digital Library
- Sarkar, D.,Waddell, O., Dybvig, R. K.: A nanopass infrastructure for compiler education. In: Proceedings of the ninth ACM SIGPLAN international conference on Functional programming. ACM (2004). Google Scholar
Digital Library
- Tarjan, R.: Depth-first search and linear graph algorithms. SIAM Journal on Computing 1(2), 146--160 (1972).Google Scholar
Cross Ref
- Tofte, M., Talpin, J. P.: Region-based memory management. Information and Computation 132(2) (1997). Google Scholar
Digital Library
- Yang, K., He, B., Luo, Q., Sander, P. V., Shi, J.: Stack-based parallel recursion on graphics processors. In: Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming. ACM (2009). Google Scholar
Digital Library
Index Terms
Region-based memory management for GPU programming languages: enabling rich data structures on a spartan host
Recommendations
Region-based memory management for GPU programming languages: enabling rich data structures on a spartan host
OOPSLA '14: Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & ApplicationsGraphics processing units (GPUs) can effectively accelerate many applications, but their applicability has been largely limited to problems whose solutions can be expressed neatly in terms of linear algebra. Indeed, most GPU programming languages limit ...
Performance Evaluation and Improvements of the PoCL Open-Source OpenCL Implementation on Intel CPUs
IWOCL'21: International Workshop on OpenCLThe Portable Computing Language (PoCL) is a vendor independent open-source OpenCL implementation that aims to support a variety of compute devices in a single platform. Evaluating PoCL versus the Intel OpenCL implementation reveals significant ...
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
SAAHPC '11: Proceedings of the 2011 Symposium on Application Accelerators in High-Performance ComputingThe graphics processing unit (GPU) has made significant strides as an accelerator in parallel computing. However, because the GPU has resided out on PCIe as a discrete device, the performance of GPU applications can be bottlenecked by data transfers ...







Comments