Abstract
GPUs have become part of most commodity systems. Nonetheless, they are often underutilized when not executing graphics-intensive or special-purpose numerical computations, which are rare in consumer workloads. Emerging architectures, such as integrated CPU/GPU combinations, may create an opportunity to utilize these otherwise unused cycles for offloading traditional systems tasks. Garbage collection appears to be a particularly promising candidate for offloading, due to the popularity of managed languages on consumer devices.
We investigate the challenges for offloading garbage collection to a GPU, by examining the performance trade-offs for the mark phase of a mark & sweep garbage collector. We present a theoretical analysis and an algorithm that demonstrates the feasibility of this approach. We also discuss a number of algorithmic design trade-offs required to leverage the strengths and capabilities of the GPU hardware. Our algorithm has been integrated into the Jikes RVM and we present promising performance results.
- B. Alpern, S. Augart, S. M. Blackburn, M. Butrico, A. Cocchi, P. Cheng, J. Dolby, S. Fink, D. Grove, M. Hind, K. S. McKinley, M. Mergen, J. E. B. Moss, T. Ngo, V. Sarkar, and M. Trapp. The Jikes Research Virtual Machine project: Building an open-source research community. IBM Systems Journal, 44(2):399--417, 2005. Google Scholar
Digital Library
- AMD. AMD Embedded G-Series Platform: The world's firs combination of low-power CPU and advanced GPU integrated into a single embedded device. http://www.amd.com/us/Documents/49282_ G-Series_platform_brief.pdf.Google Scholar
- AMD. AMD Accelerated Parallel Processing (APP) SDK OpenCL Programming Guide. http://developer.amd.com/sdks/AMDAPPSDK/assets/AMD_Accelerated_Parallel_Processing_OpenCL_Programming_Guide.pdf.Google Scholar
- A. W. Appel and A. Bendiksen. Vectorized garbage collection. The Journal of Supercomputing, 3:151--160, 1989.Google Scholar
Cross Ref
- K. Barabash and E. Petrank. Tracing garbage collection on highly parallel platforms. SIGPLAN Not., 45:1--10, June 2010. Google Scholar
Digital Library
- S. M. Blackburn, R. Garner, C. Hoffmann, A. M. Khang, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanovi´c, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The DaCapo Benchmarks: Java Benchmarking Development and Analysis. SIGPLAN Not., 41:169--190, October 2006. Google Scholar
Digital Library
- M. Elteir, H. Lin, and W.-C. Feng. Performance Characterizatio and Optimization of Atomic Operations on AMD GPUs. In 2011 IEEE International Conference on Cluster Computing (CLUSTER), pages 234 --243, Sept 2011. Google Scholar
Digital Library
- E. M. Gagnon and L. J. Hendren. SableVM: A Research Framework for the Efficient Execution of Java Bytecode. In In Proceedings of the Java Virtual Machine Research and Technology Symposium, pages 27--40, 2000. Google Scholar
Digital Library
- R. J. Garner, S. M. Blackburn, and D. Frampton. A comprehensive evaluation of object scanning techniques. In Proceedings of the International Symposium on Memory Management, ISMM '11, pages 33--42, New York, NY, USA, 2011. Google Scholar
Digital Library
- P. Harish and P. J. Narayanan. Accelerating large grap algorithms on the GPU using CUDA. Technology, 4873:197--208, 2007. Google Scholar
Digital Library
- M. Harris. Parallel Prefix Sum (Scan) with CUDA. GPU Gems, 3 (April):851--876, 2007.Google Scholar
- S. Hong, S. K. Kim, T. Oguntebi, and K. Olukotun. Accelerating CUDA graph algorithms at maximum warp. In Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming, PPoPP '11, pages 267--276, New York, NY, USA, 2011. Google Scholar
Digital Library
- A. S. Jiva and G. R. Frost. GPU Assisted Garbage Collection, 04 2010. URL http://www.patentlens.net/patentlens/patent/US_2010_0082930_A1/en/.Google Scholar
- R. Jones and R. D. Lins. Garbage Collection: Algorithms fo Automatic Dynamic Memory Management. Wiley, Sept. 1996. Google Scholar
Digital Library
- Khronos Group. OpenCL 1.2 Specification. http://www.khronos. org/registry/cl/specs/opencl-1.2.pdf.Google Scholar
- L. Luo, M.Wong, andW.-m. Hwu. An effective GPU implementation of breadth-first search. In Proceedings of the 47th Design Automation Conference, DAC '10, pages 52--55, New York, NY, USA, 2010. Google Scholar
Digital Library
- S. Marlow, T. Harris, R. P. James, and S. Peyton Jones. Parallel generational-copying garbage collection with a block-structured heap. In Proceedings of the 7th International Symposium on Memory Management, ISMM '08, pages 11--20, New York, NY, USA, 2008. Google Scholar
Digital Library
- J. Naghmouchi, D. P. Scarpazza, and M. Berekovic. Small-ruleset regular expression matching on GPGPUs: quantitative performance analysis and optimization. In Proceedings of the 24th ACM International Conference on Supercomputing, ICS '10, pages 337--348, New York, NY, USA, 2010. Google Scholar
Digital Library
- R. Smith, N. Goyal, J. Ormont, K. Sankaralingam, and C. Estan. Evaluating GPUs for network packet signature matching. In International Symposium on Performance Analysis of Systems and Software, 2009. ISPASS 2009, pages 175 --184, April 2009.Google Scholar
Cross Ref
- W. Sun and R. Ricci. Augmenting Operating Systems With the GPU. Technical report, University of Utah, 2010.Google Scholar
- R. Veldema and M. Philippsen. Iterative data-parallel mark & sweep on a GPU. In Proceedings of the International Symposium on Memory Management, ISMM '11, pages 1--10, New York, NY, USA, 2011. Google Scholar
Digital Library
- C. yong Cher and M. Gschwind. Cell GC: using the Cel synergistic processor as a garbage collection coprocessor. In VEE '08: Proceedings of the 4th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, pages 141--150. ACM, 2008. Google Scholar
Digital Library
Index Terms
GPUs as an opportunity for offloading garbage collection
Recommendations
FastCollect: offloading generational garbage collection to integrated GPUs
CASES '16: Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded SystemsGenerational Mark-Sweep Garbage Collection is a widely used garbage collection technique. However, the garbage collector has poor execution efficiency for large programs. Aggressive collection causes execution pauses in the program, while reducing the ...
GPUs as an opportunity for offloading garbage collection
ISMM '12: Proceedings of the 2012 international symposium on Memory ManagementGPUs have become part of most commodity systems. Nonetheless, they are often underutilized when not executing graphics-intensive or special-purpose numerical computations, which are rare in consumer workloads. Emerging architectures, such as integrated ...
Evaluation of Successive CPUs/APUs/GPUs Based on an OpenCL Finite Difference Stencil
PDP '13: Proceedings of the 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based ProcessingThe AMD APU (Accelerated Processing Unit) architecture, which combines CPU and GPU cores on the same die, is promising for GPU applications which performance is bottlenecked by the low PCI Express communication rate. However the first APU generations ...







Comments