skip to main content
10.1145/1375657.1375670acmconferencesArticle/Chapter ViewAbstractPublication PagescpsweekConference Proceedingsconference-collections
research-article

FaCSim: a fast and cycle-accurate architecture simulator for embedded systems

Published:12 June 2008Publication History

ABSTRACT

There have been strong demands for a fast and cycle-accurate virtual platforms in the embedded systems area where developers can do meaningful software development including performance debugging in the context of the entire platform. In this paper, we describe the design and implementation of a fast and cycle-accurate architecture simulator called FaCSim as a first step towards such a virtual platform. FacSim accurately models the ARM9E-S processor core and ARM926EJ-S processor's memory subsystem. It accurately simulates exceptions and interrupts to enable whole-system simulation including the OS. Since it is implemented in a modular manner in C++, it can be easily extended with other system components by subclassing or adding new classes. FaCSim is based on an interpretive simulation technique to provide flexibility, yet achieving high speed. It enables fast cycle-accurate architecture simulation by means of three mechanisms. First, it computes elapsed cycles in each pipeline stage as a chunk and incrementally adds it up to advance the core clock instead of performing cycle-by-cycle simulation. Second, it uses a basic-block cache that caches decoded instructions at the basic-block level. Finally, it is parallelized to exploit multicore systems that are available everywhere these days. Using 21 applications from the EEMBC benchmark suite, FaCSim's accuracy is validated against the ARM926EJ-S development board from ARM, and is accurate in a ±7% error margin. Due to basic-block level caching and parallelization, FaCSim is, on average, more than three times faster than ARMulator and more than six times faster than SimpleScalar.

References

  1. Vasanth Bala, Evelyn Duesterwald, and Sanjeev Banerjia. Dynamo: a transparent dynamic optimization system. In PLDI ?00: Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation, pages 1--12, New York, NY, USA, 2000. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Martin Burtscher and Ilya Ganusov. Automatic synthesis of highspeed processor simulators. In MICRO 37: Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture, pages 55--66, Washington, DC, USA, 2004. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Derek Chiou, Dam Sunwoo, Joonsoo Kim, Nikhil A. Patil, William Reinhart, Darrel Eric Johnson, Jebediah Keefe, and Hari Angepat. FPGA-Accelerated Simulation Technologies (FAST): Fast, Full-System, Cycle-Accurate Simulators. In MICRO ?07: Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, pages 249--261, Washington, DC, USA, 2007. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Bob Cmelik and David Keppel. Shade: a fast instruction-set simulator for execution profiling. In SIGMETRICS ?94: Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems, pages 128--137, New York, NY, USA, 1994. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. The Embedded Microprocessor Benchmark Consortium. EEMBC Benchmark Suite. http://www.eembc.com, 2008.Google ScholarGoogle Scholar
  6. James Donald and Margaret Martonosi. An Efficient, Practical Parallelization Methodology for Multicore Architecture Simulation. IEEE Computer Architecture Letters, 5(2):14--14, August 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Joel Emer, Pritpal Ahuja, Eric Borch, Artur Klauser, Chi-Keung Luk, Srilatha Manne, Shubhendu S. Mukherjee, Harish Patil, Steven Wallace, Nathan Binkert, Roger Espasa, and Toni Juan. Asim: A Performance Model Framework. Computer, 35(2):68--76, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Lei Gao, Stefan Kraemer, Rainer Leupers, Gerd Ascheid, and Heinrich Meyr. A fast and generic hybrid simulation approach using c virtual machine. In CASES ?07: Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems, pages 3--12, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Intel. VTune Performance Analyzer. http://www.intel.com, 2008.Google ScholarGoogle Scholar
  10. K. H. (Kane) Kim, Juan A. Colmenares, and Kee-Wook Rim. Efficient adaptations of the non-blocking buffer for event message communication. In ISORC?07: Proceedings of the 10th IEEE 15th International Symposium on Object and Component Oriented Real-Time Distributed Computing, May 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Stefan Kraemer, Lei Gao, Jan Weinstock, Rainer Leupers, Gerd Ascheid, and Heinrich Meyr. Hysim: a fast simulation framework for embedded software development. In CODES+ISSS ?07: Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis, pages 75--80, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Gary Lauterbach. Accelerating architectural simulation by parallel execution of trace samples. Technical report, Mountain View, CA, USA, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. ARM Limited. ARM926EJ-S Techinical Reference Manual, 2003. http://infocenter.arm.com.Google ScholarGoogle Scholar
  14. ARM Limited. ARM9E-S Core Techinical Reference Manual, 2004. http://infocenter.arm.com.Google ScholarGoogle Scholar
  15. ARM Limited. ARM Architecture Reference Manual, 2005. http://infocenter.arm.com.Google ScholarGoogle Scholar
  16. ARM Limited. Verstile Application Baseboard for ARM926EJ-S User Guide, 2006. http://infocenter.arm.com.Google ScholarGoogle Scholar
  17. ARM Limited. RealView ARMulator ISS User Guide, Version 1.4.3, 2007. http://infocenter.arm.com.Google ScholarGoogle Scholar
  18. LISA - Language for Instruction Set Architecture. http://www.iss.rwth-aachen.de/lisa/, 2001.Google ScholarGoogle Scholar
  19. Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. In PLDI ?05: Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation, pages 190--200, New York, NY, USA, 2005. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Peter S. Magnusson, Magnus Christensson, Jesper Eskilson, Daniel Forsgren, Gustav Høallberg, Johan Högberg, Fredrik Larsson, Andreas Moestedt, and Bengt Werner. Simics: A full system simulation platform. Computer, 35(2):50--58, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Carl J. Mauer, Mark D. Hill, and David A. Wood. Full-system timing-first simulation. In SIGMETRICS?02: Proceedings of the 2002 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, pages 108--116, New York, NY, USA, June 2002. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Christopher Mills, Stanley C. Ahalt, and Jim Fowler. Compiled instruction set simulation. Software, Practice and Experience, 21(8):877--889, 1991.Google ScholarGoogle ScholarCross RefCross Ref
  23. Mayan Moudgill, John-David Wellman, and Jaime H. Moreno. Environment for PowerPC Microarchitecture Exploration. IEEE Micro, 19(3):15--25, May/Jun 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Achim Nohl, Gunnar Braun, Oliver Schliebusch, Rainer Leupers, Heinrich Meyr, and Andreas Hoffmann. A universal technique for fast and flexible instruction-set architecture simulation. In DAC ?02: Proceedings of the 39th conference on Design automation, pages 22--27, New York, NY, USA, 2002. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. David A. Penry, Daniel Fay, David Hodgdon, Ryan Wells, Graham Schelle, David I. August, and Dan Connors. Exploiting Parallelism and Structure to Accelerate the Simulation of Chip Multi-processors. In HPCA ?06: Proceedings of the 12th International Symposium on High-Performance Computer Architecture, pages 27--38, Feburary 2006.Google ScholarGoogle Scholar
  26. M. Poncino and Jianwen Zhu. Dynamosim: a trace-based dynamically compiled instruction set simulator. In ICCAD ?04: Proceedings of the 2004 IEEE/ACM International conference on Computeraided design, pages 131--136, Washington, DC, USA, 2004. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. QEMU. http://fabrice.bellard.free.fr/qemu/, 2008.Google ScholarGoogle Scholar
  28. Wei Qin, Joseph D?Errico, and Xinping Zhu. A multiprocessing approach to accelerate retargetable and portable dynamic-compiled instruction-set simulation. In CODES+ISSS ?06: Proceedings of the 4th international conference on Hardware/software codesign and system synthesis, pages 193--198, New York, NY, USA, 2006. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Mehrdad Reshadi, Prabhat Mishra, and Nikil Dutt. Instruction set compiled simulation: a technique for fast and flexible instruction set simulation. In DAC ?03: Proceedings of the 40th conference on Design automation, pages 758--763, New York, NY, USA, 2003. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Mendel Rosenblum, Stephen A. Herrod, Emmett Witchel, and Anoop Gupta. Complete computer system simulation: The simos approach. IEEE Parallel Distrib. Technol., 3(4):34--43, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Eric Schnarr and James R. Larus. Fast out-of-order processor simulation using memoization. In ASPLOS-VIII: Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, pages 283--294, New York, NY, USA, 1998. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Kevin Scott and Jack Davidson. Strata: A software dynamic translation infrastructure. Technical report, Charlottesville, VA, USA, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. SESC: SuperESCalar Simulator. http://iacoma.cs.uiuc.edu/~paulsack/sescdoc/, 2002.Google ScholarGoogle Scholar
  34. SimpleScalar. http://www.simplescalar.com, 2004.Google ScholarGoogle Scholar
  35. Infineon Technologies. HYB39S512400T(L), HYB39S512800T(L), HYB39S512160T(L) 512-Mbit Synchronous DRAM Data Sheet, Rev. 1.3, 2003. http://www.infineon.com.Google ScholarGoogle Scholar
  36. Philippas Tsigas and Yi Zhang. A simple, fast and scalable nonblocking concurrent fifo queue for shared memory multiprocessor systems. In SPAA ?01: Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures, pages 134--143, New York, NY, USA, 2001. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Steven Wallace and Kim Hazelwood. Superpin: Parallelizing dynamic instrumentation for real-time performance. In CGO ?07: Proceedings of the International Symposium on Code Generation and Optimization, pages 209--220, Washington, DC, USA, 2007. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Emmett Witchel and Mendel Rosenblum. Embra: fast and flexible machine simulation. In SIGMETRICS ?96: Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, pages 68--79, New York, NY, USA, 1996. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Ji Zhang, Jaejin Lee, and Philip K. McKinley. Optimizing the java piped i/o stream library for performance. In LCPC ?02: Proceedings of the 15th International Workshop on Languages and Compilers for Parallel Computing, pages 233--248, Berlin/Heidelberg, Germany, July 2002. Springer. Also published in Springer Lecture Notes in Computer Science, Vol. 2481/2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Jianwen Zhu and Daniel D. Gajski. A retargetable, ultra-fast instruction set simulator. In DATE ?99: Proceedings of the conference on Design, automation and test in Europe, page 62, New York, NY, USA, 1999. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. FaCSim: a fast and cycle-accurate architecture simulator for embedded systems

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!