skip to main content
research-article

ProtoFlex: Towards Scalable, Full-System Multiprocessor Simulations Using FPGAs

Authors Info & Claims
Published:01 June 2009Publication History
Skip Abstract Section

Abstract

Functional full-system simulators are powerful and versatile research tools for accelerating architectural exploration and advanced software development. Their main shortcoming is limited throughput when simulating large multiprocessor systems with hundreds or thousands of processors or when instrumentation is introduced. We propose the ProtoFlex simulation architecture, which uses FPGAs to accelerate full-system multiprocessor simulation and to facilitate high-performance instrumentation. Prior FPGA approaches that prototype a complete system in hardware are either too complex when scaling to large-scale configurations or require significant effort to provide full-system support. In contrast, ProtoFlex virtualizes the execution of many logical processors onto a consolidated number of multiple-context execution engines on the FPGA. Through virtualization, the number of engines can be judiciously scaled, as needed, to deliver on necessary simulation performance at a large savings in complexity. Further, to achieve low-complexity full-system support, a hybrid simulation technique called transplanting allows implementing in the FPGA only the frequently encountered behaviors, while a software simulator preserves the abstraction of a complete system.

We have created a first instance of the ProtoFlex simulation architecture, which is an FPGA-based, full-system functional simulator for a 16-way UltraSPARC III symmetric multiprocessor server, hosted on a single Xilinx Virtex-II XCV2P70 FPGA. On average, the simulator achieves a 38x speedup (and as high as 49×) over comparable software simulation across a suite of applications, including OLTP on a commercial database server. We also demonstrate the advantages of minimal-overhead FPGA-accelerated instrumentation through a CMP cache simulation technique that runs orders-of-magnitude faster than software.

References

  1. AMD. 2008. Advanced Micro Devices, SimNow Simulator 4.4.3. User’s manual.Google ScholarGoogle Scholar
  2. Barroso, L. A., Gharachorloo, K., McNamara, R., Nowatzyk, A., Qadeer, S., Sano, B., Smith, S., Stets, R., and Verghese, B. 2000. Piranha: A scalable architecture based on single-chip multiprocessing. SIGARCH Comput. Archit. News 28, 2, 282--293. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bellard, F. 2005. QEMU, A fast and portable dynamic translator. In Proceedings of the Annual Conference on USENIX Annual Technical Conference (ATEC’05). USENIX Association, 41--41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Binkert, N. L., Dreslinski, R. G., Hsu, L. R., Lim, K. T., Saidi, A. G., and Reinhardt, S. K. 2006. The M5 simulator: Modeling networked systems. IEEE Micro 26, 4, 52--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Bohrer, P., Peterson, J., Elnozahy, M., Rajamony, R., Gheith, A., Rockhold, R., Lefurgy, C., Shafi, H., Nakra, T., Simpson, R., Speight, E., Sudeep, K., Hensbergen, E., and Zhang, L. 2004. Mambo: A full system simulator for the PowerPC architecture. ACM SIGMETRICS Perform. Eval. Rev. 31, 4, 8--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Chang, C., Wawrzynek, J., and Brodersen, R. W. 2005. BEE2: A high-end reconfigurable computing system. IEEE Des. Test Comput. 22, 2, 114--125. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Chen, S., Kozuch, M., Strigkos, T., Falsafi, B., Gibbons, P. B., Mowry, T. C., Ramachandran, V., Ruwase, O., Ryan, M., and Vlachos, E. 2008. Flexible hardware acceleration for instruction-grain program monitoring. In Proceedings of the 35th International Symposium on Computer Architecture (ISCA’08). IEEE Computer Society, 377--388. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Chidester, M. and George, A. 2002. Parallel simulation of chip-multiprocessor architectures. ACM Trans. Model. Comput. Simul. 12, 3, 176--200. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Chiou, D., Sunwoo, D., Kim, J., Patil, N. A., Reinhart, W., Johnson, D. E., Keefe, J., and Angepat, H. 2007. FPGA-Accelerated simulation technologies (FAST): Fast, full-system, cycle-accurate simulators. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’07). IEEE Computer Society, 249--261. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Chung, E. S., Nurvitadhi, E., Hoe, J. C., Falsafi, B., and Mai, K. 2008. A complexity-effective architecture for accelerating full-system multiprocessor simulations using FPGAs. In Proceedings of the 16th International ACM/SIGDA Symposium on Field Programmable Gate Arrays (FPGA’08). ACM, New York, 77--86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Dalton, M., Kannan, H., and Kozyrakis, C. 2007. Raksha: A flexible information flow architecture for software security. SIGARCH Comput. Archit. News 35, 2, 482--493. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Emer, J., Ahuja, P., Borch, E., Klauser, A., Luk, C.-K., Manne, S., Mukherjee, S. S., Patil, H., Wallace, S., Binkert, N., Espasa, R., and Juan, T. 2002. Asim: A performance model framework. Comput. 35, 2, 68--76. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Hankins, R., Diep, T., Annavaram, M., Hirano, B., Eri, H., Nueckel, H., and Shen, J. 2003. Scaling and characterizing database workloads: Bridging the gap between research and practice. In Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture. 151--162. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Krasnov, A., Schultz, A., Wawrzynek, J., Gibeling, G., and Droz, P. 2007. RAMP Blue: A message-passing manycore system in FPGAs. In Proceedings of the Conference on Field Programmable Logic and Applications.Google ScholarGoogle Scholar
  15. Lantz, R. 2008. Fast functional simulation with parallel Embra. In Proceedings of the 4th Annual Workshop on Modeling, Benchmarking and Simulation.Google ScholarGoogle Scholar
  16. Legedza, U. and Weihl, W. E. 1996. Reducing synchronization overhead in parallel simulation. SIGSIM Simul. Digest 26, 1, 86--95. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Lu, S.-L. L., Yiannacouras, P., Kassa, R., Konow, M., and Suh, T. 2007. An FPGA-based Pentium®in a complete desktop system. In Proceedings of the ACM/SIGDA 15th International Symposium on Field Programmable Gate Arrays (FPGA’07). ACM, New York, 53--59. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Magnusson, P., Christensson, M., Eskilson, J., Forsgren, D., Hallberg, G., Hogberg, J., Larsson, F., Moestedt, A., and Werner, B. 2002. Simics: A full system simulation platform. Comput. 35, 2, 50--58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Martin, M. M. K., Sorin, D. J., Beckmann, B. M., Marty, M. R., Xu, M., Alameldeen, A. R., Moore, K. E., Hill, M. D., and Wood, D. A. 2005. Multifacet’s general execution-driven multiprocessor simulator (GEMS) toolset. SIGARCH Comput. Archit. News 33, 4, 92--99. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Mukherjee, S., Reinhardt, S., Falsafi, B., Litzkow, M., Hill, M., Wood, D., Huss-Lederman, S., and Larus, J. 2000. Wisconsin Wind Tunnel II: a fast, portable parallel architecture simulator. Concurr. IEEE 8, 4, 12--20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Nethercote, N. and Seward, J. 2007. Valgrind: A framework for heavyweight dynamic binary instrumentation. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’07). ACM, New York, 89--100. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Nussbaum, F., Fedorova, A., and Small, C. 2004. An overview of the Sam CMT simulator kit. Tech. rep. TR-2004-133, Sun Microsystems Research Labs. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Öner, K., Barroso, L. A., Iman, S., Jeong, J., Ramamurthy, K., and Dubois, M. 1995. The design of RPM: An FPGA-based multiprocessor emulator. In Proceedings of the ACM 3rd International Symposium on Field Programmable Gate Arrays (FPGA’95). ACM, New York, 60--66. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Over, A., Clarke, B., and Strazdins, P. 2007. A comparison of two approaches to parallel simulation of multiprocessors. ispass 0, 12--22.Google ScholarGoogle Scholar
  25. Patil, H., Cohn, R., Charney, M., Kapoor, R., Sun, A., and Karunanidhi, A. 2004. Pinpointing representative portions of large Intel®Itanium®programs with dynamic instrumentation. In Proceedings of the 37th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’04). IEEE Computer Society, 81--92. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Pellauer, M., Vijayaraghavan, M., Adler, M., and Emer, J. 2008. Quick performance models quickly: Timing-Directed simulation on FPGAs. In Proceedings of the International Symposium on Performance Analysis of Systems and Software. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Penry, D., Fay, D., Hodgdon, D., Wells, R., Schelle, G., August, D., and Connors, D. 2006. Exploiting parallelism and structure to accelerate the simulation of chip multi-processors. In Proceedings of the 12th International Symposium on High-Performance Computer Architecture, 29--40.Google ScholarGoogle Scholar
  28. Reinhardt, S. K., Hill, M. D., Larus, J. R., Lebeck, A. R., Lewis, J. C., and Wood, D. A. 1993. The Wisconsin Wind Tunnel: Virtual prototyping of parallel computers. ACM SIGMETRICS Perform. Eval. Rev. 21, 1, 48--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Rosenblum, M., Herrod, S. A., Witchel, E., and Gupta, A. 1995. Complete computer system simulation: The SimOS approach. IEEE Parallel Distrib. Technol. 3, 4, 34--43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Smith, B. 1985. In The Architecture of HEP on Parallel MIMD Computation: HEP Supercomputer and its Applications. Massachusetts Institute of Technology, Cambridge, MA, 41--55. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Srivastava, A. and Eustace, A. 1994. ATOM: A system for building customized program analysis tools. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’94). ACM, New York, 196--205. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Tan, Z., Asanović, K., and Patterson, D. 2008. An FPGA host-multithreaded functional model for SPARC v8. In Proceedings of the 3rd Workshop on Architectural Research Prototyping.Google ScholarGoogle Scholar
  33. Thornton, J. E. 1995. Parallel operation in the control data 6600. 5--12.Google ScholarGoogle Scholar
  34. Vahia, D. and Hartke, P. 2007. OpenSPARC T1 on Xilinx FPGAs--Updates. June 2007 RAMP Retreat.Google ScholarGoogle Scholar
  35. Venkataramani, G., Roemer, B., Solihin, Y., and Prvulovic, M. 2007. MemTracker: Efficient and programmable support for memory access monitoring and debugging. In Proceedings of the IEEE 13th International Symposium on High Performance Computer Architecture (HPCA’07). IEEE Computer Society, 273--284. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Wang, K., Zhang, Y., Wang, H., and Shen, X. 2008. Parallelization of IBM mambo system simulator in functional modes. SIGOPS Oper. Syst. Rev. 42, 1, 71--76. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Wawrzynek, J., Patterson, D., Oskin, M., Lu, S.-L., Kozyrakis, C., Hoe, J. C., Chiou, D., and Asanović, K. 2007. RAMP: Research accelerator for multiple processors. IEEE Micro 27, 2, 46--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Wee, S., Casper, J., Njoroge, N., Tesylar, Y., Ge, D., Kozyrakis, C., and Olukotun, K. 2007. A practical FPGA-based framework for novel CMP research. In Proceedings of the ACM/SIGDA 15th International Symposium on Field Programmable Gate Arrays (FPGA’07). ACM, New York, 116--125. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Wenisch, T. and Wunderlich, R. 2005. SimFlex: Fast, accurate and flexible simulation of computer systems. In Proceedings of the Tutorial in the International Symposium on Microarchitecture (MICRO-38).Google ScholarGoogle Scholar
  40. Wenisch, T. F., Wunderlich, R. E., Ferdman, M., Ailamaki, A., Falsafi, B., and Hoe, J. C. 2006. SimFlex: Statistical sampling of computer system simulation. IEEE Micro 26, 4, 18--31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Witchel, E. and Rosenblum, M. 1996. Embra: Fast and flexible machine simulation. ACM SIGMETRICS Perform. Eval. Rev. 24, 1, 68--79. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Yourst, M. 2007. PTLsim: A cycle accurate full system x86-64 microarchitectural simulator. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software. 23--34.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. ProtoFlex: Towards Scalable, Full-System Multiprocessor Simulations Using FPGAs

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Reconfigurable Technology and Systems
      ACM Transactions on Reconfigurable Technology and Systems  Volume 2, Issue 2
      June 2009
      211 pages
      ISSN:1936-7406
      EISSN:1936-7414
      DOI:10.1145/1534916
      Issue’s Table of Contents

      Copyright © 2009 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 June 2009
      • Accepted: 1 November 2008
      • Revised: 1 August 2008
      • Received: 1 June 2008
      Published in trets Volume 2, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!