Abstract

Despite rapid increases in memory capacity, reconfigurable hardware is still programmed in a very low-level manner, generally without any dynamic allocation at all. This limits productivity especially as the larger chips encourage more and more complex designs to be attempted.
Prior work has shown that it is possible to implement a real-time collector in hardware and achieve stall-free operation --- but at the price of severe restrictions on object layouts. We present the first hardware garbage collector capable of collecting multiple inter-connected heaps, thereby allowing a rich set of object types.
We show that for a modest additional cost in logic and memory, we can support multiple heaps at a clock frequency competitive with monolithic, fixed-layout heaps. We evaluate the hardware design by synthesizing it for a Xilinx FPGA and using co-simulation to measure the run-time behavior over a set of four benchmarks. Even at high allocation and mutation rates the collector is able to sustain stall-free (100% minimum mutator utilization) operation with up to 4 inter-connected heaps, while only requiring between 1.1 and 1.7 times the maximum live memory of the application.
- S. E. Abdullahi and G. A. Ringwood. Garbage collecting the internet: A survey of distributed garbage collection. ACM Comput. Surv., 30(3):330--373, Sept. 1998. Google Scholar
Digital Library
- D. F. Bacon, P. Cheng, and S. Shukla. And then there were none: A stall-free real-time garbage collector for reconfigurable hardware. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 23--34, 2012. Google Scholar
Digital Library
- D. F. Bacon, P. Cheng, and S. Shukla. A generalized high-performance garbage collector for FPGA data structures. Technical report, IBM Research, Jan. 2014.Google Scholar
- H. G. Baker. List processing in real-time on a serial computer. Commun. ACM, 21(4):280--294, Apr. 1978. Google Scholar
Digital Library
- A. Canis et al. LegUp: An open-source high-level synthesis tool for FPGA-based processor/accelerator systems. TECS, 13(2):1:1--1:25, Sept. 2013. Google Scholar
Digital Library
- P. Cheng and G. Blelloch. A parallel, real-time garbage collector. In Proc. SIGPLAN Conference on Programming Language Design and Implementation, pp. 125--136, Snowbird, Utah, June 2001. Google Scholar
Digital Library
- C. Click, G. Tene, and M. Wolf. The pauseless GC algorithm. In Proceedings of the First ACM/USENIX International Conference on Virtual Execution Environments, pp. 46--56, 2005. Google Scholar
Digital Library
- R. P. Colwell, R. P. Nix, J. J. O'Donnell, D. B. Papworth, and P. K. Rodman. A VLIW architecture for a trace scheduling compiler. IEEE Trans. Computers, 37(8):967--979, 1988. Google Scholar
Digital Library
- B. Cook et al. Finding heap-bounds for hardware synthesis. In Formal Methods in Computer-Aided Design, pp. 205--212, Nov. 2009.Google Scholar
- P. Faes, M. Christiaens, D. Buytaert, and D. Stroobandt. FPGA-aware garbage collection in Java. In T. Rissa, S. J. E. Wilton, and P. H. W. Leong, editors, FPL, pp. 675--680, 2005.Google Scholar
- M. Meyer. An on-chip garbage collection coprocessor for embedded real-time systems. In Proceedings of the 11th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, pp. 517--524, 2005. Google Scholar
Digital Library
- D. A. Moon. Garbage collection in a large LISP system. In Conference Record of the 1984 ACM Symposium on LISP and Functional Programming, Austin, Texas, Aug. 1984. Google Scholar
Digital Library
- W. J. Schmidt and K. D. Nilsen. Performance of a hardware-assisted real-time garbage collector. In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 76--85, 1994. Google Scholar
Digital Library
- M. Schoeberl and W. Puffitsch. Nonblocking real-time garbage collection. ACM Trans. Embedded Comput. Sys., 10:1--28, 2010. Google Scholar
Digital Library
- J. Simsa and S. Singh. Designing hardware with dynamic memory abstraction. In Proceedings of the 18th Annual International Symposium on Field Programmable Gate Arrays, pp. 69--72, 2010. Google Scholar
Digital Library
- G. L. Steele, Jr. Data representation in PDP-10 MACLISP. Technical report, MIT, 1977. AI Memo 420.Google Scholar
- D. Ungar et al. Architecture of SOAR: Smalltalk on a RISC. In Proceedings of the 11th Annual International Symposium on Computer Architecture, pp. 188--197, 1984. Google Scholar
Digital Library
- Xilinx. Virtex-5 family overview. Technical Report DS100, Feb. 2009.Google Scholar
- T. Yuasa. Real-time garbage collection on general-purpose machines. J. Systems and Software, 11(3):181--198, Mar. 1990. Google Scholar
Digital Library
- J. Zhou and B. Demsky. Locality-aware many-core garbage collectionTechnical Report CECS 10-08, Center for Embedded Computer Systems University of California, Irvine, Aug. 2010.Google Scholar
Index Terms
Parallel real-time garbage collection of multiple heaps in reconfigurable hardware
Recommendations
And then there were none: a stall-free real-time garbage collector for reconfigurable hardware
PLDI '12Programmers are turning to radical architectures such as reconfigurable hardware (FPGAs) to achieve performance. But such systems, programmed at a very low level in languages with impoverished abstractions, are orders of magnitude more complex to use ...
Parallel real-time garbage collection of multiple heaps in reconfigurable hardware
ISMM '14: Proceedings of the 2014 international symposium on Memory managementDespite rapid increases in memory capacity, reconfigurable hardware is still programmed in a very low-level manner, generally without any dynamic allocation at all. This limits productivity especially as the larger chips encourage more and more complex ...
And then there were none: a stall-free real-time garbage collector for reconfigurable hardware
PLDI '12: Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and ImplementationProgrammers are turning to radical architectures such as reconfigurable hardware (FPGAs) to achieve performance. But such systems, programmed at a very low level in languages with impoverished abstractions, are orders of magnitude more complex to use ...







Comments