Abstract

Programmers are turning to radical architectures such as reconfigurable hardware (FPGAs) to achieve performance. But such systems, programmed at a very low level in languages with impoverished abstractions, are orders of magnitude more complex to use than conventional CPUs. The continued exponential increase in transistors, combined with the desire to implement ever more sophisticated algorithms, makes it imperative that such systems be programmed at much higher levels of abstraction. One of the fundamental high-level language features is automatic memory management in the form of garbage collection.
We present the first implementation of a complete garbage collector in hardware (as opposed to previous "hardware-assist" techniques), using an FPGA and its on-chip memory. Using a completely concurrent snapshot algorithm, it provides single-cycle access to the heap, and never stalls the mutator for even a single cycle, achieving a deterministic mutator utilization (MMU) of 100%.
We have synthesized the collector to hardware and show that it never consumes more than 1% of the logic resources of a high-end FPGA. For comparison we also implemented explicit (malloc/free) memory management, and show that real-time collection is about 4% to 17% slower than malloc, with comparable energy consumption. Surprisingly, in hardware real-time collection is superior to stop-the-world collection on every performance axis, and even for stressful micro-benchmarks can achieve 100% MMU with heaps as small as 1.01 to 1.4 times the absolute minimum.
- M. Adler et al. Leap scratchpads: automatic memory and cache management for reconfigurable logic. In FPGA, pp. 25--28, 2011. Google Scholar
Digital Library
- A. W. Appel, J. R. Ellis, and K. Li. Real-time concurrent collection on stock multiprocessors. In PLDI, pp. 11--20, June 1988. Google Scholar
Digital Library
- J. Auerbach, D. F. Bacon, P. Cheng, D. Grove, B. Biron, C. Gracie, B. McCloskey, A. Micic, and R. Sciampacone. Tax-and-spend: democratic scheduling for real-time garbage collection. In EMSOFT, pp. 245--254, 2008. Google Scholar
Digital Library
- J. Auerbach, D. F. Bacon, P. Cheng, and R. Rabbah. Lime: a Java-compatible and synthesizable language for heterogeneous architectures. In OOPSLA, pp. 89--108, Oct. 2010. Google Scholar
Digital Library
- D. F. Bacon, P. Cheng, and V. T. Rajan. A real-time garbage collector with low overhead and consistent utilization. In POPL, pp. 285--298, Jan. 2003. Google Scholar
Digital Library
- H. G. Baker. List processing in real-time on a serial computer. Commun. ACM, 21(4):280--294, Apr. 1978. Google Scholar
Digital Library
- G. E. Blelloch and P. Cheng. On bounding time and space for multiprocessor garbage collection. In PLDI, pp. 104--117, June 1999. Google Scholar
Digital Library
- R. A. Brooks. Trading data space for reduced time and code space in real-time garbage collection on stock hardware. In LFP, pp. 256--262, Aug. 1984. Google Scholar
Digital Library
- V. Chvatal. Linear Programming. W. H. Freeman and Company, 1983.Google Scholar
- C. Click, G. Tene, and M. Wolf. The pauseless GC algorithm. In VEE, pp. 46--56, 2005. Google Scholar
Digital Library
- B. Cook et al. Finding heap-bounds for hardware synthesis. In FMCAD, pp. 205--212, Nov. 2009.Google Scholar
Cross Ref
- E. W. Dijkstra, L. Lamport, A. J. Martin, C. S. Scholten, and E. F. M. Steffens. On-the-fly garbage collection: an exercise in cooperation. Commun. ACM, 21(11):966--975, 1978. Google Scholar
Digital Library
- P. Faes, M. Christiaens, D. Buytaert, and D. Stroobandt. FPGA- aware garbage collection in Java. In FPL, pp. 675--680, 2005.Google Scholar
- D. Greaves and S. Singh. Kiwi: Synthesis of FPGA circuits from parallel programs. In FCCM, 2008. Google Scholar
Digital Library
- T. H. Heil and J. E. Smith. Concurrent garbage collection using hardware-assisted profiling. In ISMM, pp. 80--93, 2000. Google Scholar
Digital Library
- R. Henriksson. Scheduling Garbage Collection in Embedded Systems. PhD thesis, Lund Institute of Technology, July 1998.Google Scholar
- J. A. Joao, O. Mutlu, and Y. N. Patt. Flexible reference-counting-based hardware acceleration for garbage collection. In ISCA, pp. 418--428, 2009. Google Scholar
Digital Library
- J. McCarthy. Recursive functions of symbolic expressions and their computation by machine. Commun. ACM, 3(4):184--195, 1960. Google Scholar
Digital Library
- Mentor Graphics. ModelSim SE Users Manual. Version 10.0c.Google Scholar
- M. Meyer. An on-chip garbage collection coprocessor for embedded real-time systems. In RTCSA, pp. 517--524, 2005. Google Scholar
Digital Library
- D. A. Moon. Garbage collection in a large LISP system. In LFP, Aug. 1984. Google Scholar
Digital Library
- F. Pizlo, D. Frampton, E. Petrank, and B. Steensgaard. Stopless: a real-time garbage collector for multiprocessors. In ISMM, pp. 159--172, 2007. Google Scholar
Digital Library
- W. J. Schmidt and K. D. Nilsen. Performance of a hardware-assisted real-time garbage collector. In ASPLOS, pp. 76--85, 1994. Google Scholar
Digital Library
- M. Schoeberl and W. Puffitsch. Nonblocking real-time garbage collection. ACM Trans. Embedded Comput. Sys., 10:1--28, 2010. Google Scholar
Digital Library
- J. Simsa and S. Singh. Designing hardware with dynamic memory abstraction. In FPGA, pp. 69--72, 2010. Google Scholar
Digital Library
- W. Srisa-an, C.-T. D. Lo, and J. M. Chang. Active memory processor: A hardware garbage collector for real-time Java embedded devices. IEEE Trans. Mob. Comput., 2(2):89--101, 2003. Google Scholar
Digital Library
- G. L. Steele, Jr. Multiprocessing compactifying garbage collection. Commun. ACM, 18(9):495--508, Sept. 1975. Google Scholar
Digital Library
- G. L. Steele, Jr. Data representation in PDP-10 MACLISP. Tech. rep., MIT, 1977. AI Memo 420.Google Scholar
- G. Tene, B. Iyengar, and M. Wolf. C4: the continuously concurrent compacting collector. In ISMM, pp. 79--88, 2011. Google Scholar
Digital Library
- D. Ungar et al. Architecture of SOAR: Smalltalk on a RISC. In ISCA, pp. 188--197, 1984. Google Scholar
Digital Library
- Wikipedia. Intel iAPX 432, Nov. 2011.Google Scholar
- Xilinx. Virtex-5 family overview. Tech. Rep. DS100, Feb. 2009.Google Scholar
- Xilinx. Power methodology guide. Tech. Rep. DS786, Mar. 2011.Google Scholar
- W. S. Yu. Hardware concurrent garbage collection for object-oriented processor. Master's thesis, City University of Hong Kong, 2005.Google Scholar
- T. Yuasa. Real-time garbage collection on general-purpose machines. J. Systems and Software, 11(3):181--198, Mar. 1990. Google Scholar
Digital Library
Index Terms
And then there were none: a stall-free real-time garbage collector for reconfigurable hardware
Recommendations
Parallel real-time garbage collection of multiple heaps in reconfigurable hardware
ISMM '14Despite rapid increases in memory capacity, reconfigurable hardware is still programmed in a very low-level manner, generally without any dynamic allocation at all. This limits productivity especially as the larger chips encourage more and more complex ...
And then there were none: a stall-free real-time garbage collector for reconfigurable hardware
PLDI '12: Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and ImplementationProgrammers are turning to radical architectures such as reconfigurable hardware (FPGAs) to achieve performance. But such systems, programmed at a very low level in languages with impoverished abstractions, are orders of magnitude more complex to use ...
Parallel real-time garbage collection of multiple heaps in reconfigurable hardware
ISMM '14: Proceedings of the 2014 international symposium on Memory managementDespite rapid increases in memory capacity, reconfigurable hardware is still programmed in a very low-level manner, generally without any dynamic allocation at all. This limits productivity especially as the larger chips encourage more and more complex ...







Comments