ABSTRACT
There have been strong demands for a fast and cycle-accurate virtual platforms in the embedded systems area where developers can do meaningful software development including performance debugging in the context of the entire platform. In this paper, we describe the design and implementation of a fast and cycle-accurate architecture simulator called FaCSim as a first step towards such a virtual platform. FacSim accurately models the ARM9E-S processor core and ARM926EJ-S processor's memory subsystem. It accurately simulates exceptions and interrupts to enable whole-system simulation including the OS. Since it is implemented in a modular manner in C++, it can be easily extended with other system components by subclassing or adding new classes. FaCSim is based on an interpretive simulation technique to provide flexibility, yet achieving high speed. It enables fast cycle-accurate architecture simulation by means of three mechanisms. First, it computes elapsed cycles in each pipeline stage as a chunk and incrementally adds it up to advance the core clock instead of performing cycle-by-cycle simulation. Second, it uses a basic-block cache that caches decoded instructions at the basic-block level. Finally, it is parallelized to exploit multicore systems that are available everywhere these days. Using 21 applications from the EEMBC benchmark suite, FaCSim's accuracy is validated against the ARM926EJ-S development board from ARM, and is accurate in a ±7% error margin. Due to basic-block level caching and parallelization, FaCSim is, on average, more than three times faster than ARMulator and more than six times faster than SimpleScalar.
- Vasanth Bala, Evelyn Duesterwald, and Sanjeev Banerjia. Dynamo: a transparent dynamic optimization system. In PLDI ?00: Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation, pages 1--12, New York, NY, USA, 2000. ACM Press. Google Scholar
Digital Library
- Martin Burtscher and Ilya Ganusov. Automatic synthesis of highspeed processor simulators. In MICRO 37: Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture, pages 55--66, Washington, DC, USA, 2004. IEEE Computer Society. Google Scholar
Digital Library
- Derek Chiou, Dam Sunwoo, Joonsoo Kim, Nikhil A. Patil, William Reinhart, Darrel Eric Johnson, Jebediah Keefe, and Hari Angepat. FPGA-Accelerated Simulation Technologies (FAST): Fast, Full-System, Cycle-Accurate Simulators. In MICRO ?07: Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, pages 249--261, Washington, DC, USA, 2007. IEEE Computer Society. Google Scholar
Digital Library
- Bob Cmelik and David Keppel. Shade: a fast instruction-set simulator for execution profiling. In SIGMETRICS ?94: Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems, pages 128--137, New York, NY, USA, 1994. ACM Press. Google Scholar
Digital Library
- The Embedded Microprocessor Benchmark Consortium. EEMBC Benchmark Suite. http://www.eembc.com, 2008.Google Scholar
- James Donald and Margaret Martonosi. An Efficient, Practical Parallelization Methodology for Multicore Architecture Simulation. IEEE Computer Architecture Letters, 5(2):14--14, August 2006. Google Scholar
Digital Library
- Joel Emer, Pritpal Ahuja, Eric Borch, Artur Klauser, Chi-Keung Luk, Srilatha Manne, Shubhendu S. Mukherjee, Harish Patil, Steven Wallace, Nathan Binkert, Roger Espasa, and Toni Juan. Asim: A Performance Model Framework. Computer, 35(2):68--76, 2002. Google Scholar
Digital Library
- Lei Gao, Stefan Kraemer, Rainer Leupers, Gerd Ascheid, and Heinrich Meyr. A fast and generic hybrid simulation approach using c virtual machine. In CASES ?07: Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems, pages 3--12, New York, NY, USA, 2007. ACM. Google Scholar
Digital Library
- Intel. VTune Performance Analyzer. http://www.intel.com, 2008.Google Scholar
- K. H. (Kane) Kim, Juan A. Colmenares, and Kee-Wook Rim. Efficient adaptations of the non-blocking buffer for event message communication. In ISORC?07: Proceedings of the 10th IEEE 15th International Symposium on Object and Component Oriented Real-Time Distributed Computing, May 2007. Google Scholar
Digital Library
- Stefan Kraemer, Lei Gao, Jan Weinstock, Rainer Leupers, Gerd Ascheid, and Heinrich Meyr. Hysim: a fast simulation framework for embedded software development. In CODES+ISSS ?07: Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis, pages 75--80, New York, NY, USA, 2007. ACM. Google Scholar
Digital Library
- Gary Lauterbach. Accelerating architectural simulation by parallel execution of trace samples. Technical report, Mountain View, CA, USA, 1993. Google Scholar
Digital Library
- ARM Limited. ARM926EJ-S Techinical Reference Manual, 2003. http://infocenter.arm.com.Google Scholar
- ARM Limited. ARM9E-S Core Techinical Reference Manual, 2004. http://infocenter.arm.com.Google Scholar
- ARM Limited. ARM Architecture Reference Manual, 2005. http://infocenter.arm.com.Google Scholar
- ARM Limited. Verstile Application Baseboard for ARM926EJ-S User Guide, 2006. http://infocenter.arm.com.Google Scholar
- ARM Limited. RealView ARMulator ISS User Guide, Version 1.4.3, 2007. http://infocenter.arm.com.Google Scholar
- LISA - Language for Instruction Set Architecture. http://www.iss.rwth-aachen.de/lisa/, 2001.Google Scholar
- Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. In PLDI ?05: Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation, pages 190--200, New York, NY, USA, 2005. ACM Press. Google Scholar
Digital Library
- Peter S. Magnusson, Magnus Christensson, Jesper Eskilson, Daniel Forsgren, Gustav Høallberg, Johan Högberg, Fredrik Larsson, Andreas Moestedt, and Bengt Werner. Simics: A full system simulation platform. Computer, 35(2):50--58, 2002. Google Scholar
Digital Library
- Carl J. Mauer, Mark D. Hill, and David A. Wood. Full-system timing-first simulation. In SIGMETRICS?02: Proceedings of the 2002 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, pages 108--116, New York, NY, USA, June 2002. ACM. Google Scholar
Digital Library
- Christopher Mills, Stanley C. Ahalt, and Jim Fowler. Compiled instruction set simulation. Software, Practice and Experience, 21(8):877--889, 1991.Google Scholar
Cross Ref
- Mayan Moudgill, John-David Wellman, and Jaime H. Moreno. Environment for PowerPC Microarchitecture Exploration. IEEE Micro, 19(3):15--25, May/Jun 1999. Google Scholar
Digital Library
- Achim Nohl, Gunnar Braun, Oliver Schliebusch, Rainer Leupers, Heinrich Meyr, and Andreas Hoffmann. A universal technique for fast and flexible instruction-set architecture simulation. In DAC ?02: Proceedings of the 39th conference on Design automation, pages 22--27, New York, NY, USA, 2002. ACM Press. Google Scholar
Digital Library
- David A. Penry, Daniel Fay, David Hodgdon, Ryan Wells, Graham Schelle, David I. August, and Dan Connors. Exploiting Parallelism and Structure to Accelerate the Simulation of Chip Multi-processors. In HPCA ?06: Proceedings of the 12th International Symposium on High-Performance Computer Architecture, pages 27--38, Feburary 2006.Google Scholar
- M. Poncino and Jianwen Zhu. Dynamosim: a trace-based dynamically compiled instruction set simulator. In ICCAD ?04: Proceedings of the 2004 IEEE/ACM International conference on Computeraided design, pages 131--136, Washington, DC, USA, 2004. IEEE Computer Society. Google Scholar
Digital Library
- QEMU. http://fabrice.bellard.free.fr/qemu/, 2008.Google Scholar
- Wei Qin, Joseph D?Errico, and Xinping Zhu. A multiprocessing approach to accelerate retargetable and portable dynamic-compiled instruction-set simulation. In CODES+ISSS ?06: Proceedings of the 4th international conference on Hardware/software codesign and system synthesis, pages 193--198, New York, NY, USA, 2006. ACM Press. Google Scholar
Digital Library
- Mehrdad Reshadi, Prabhat Mishra, and Nikil Dutt. Instruction set compiled simulation: a technique for fast and flexible instruction set simulation. In DAC ?03: Proceedings of the 40th conference on Design automation, pages 758--763, New York, NY, USA, 2003. ACM Press. Google Scholar
Digital Library
- Mendel Rosenblum, Stephen A. Herrod, Emmett Witchel, and Anoop Gupta. Complete computer system simulation: The simos approach. IEEE Parallel Distrib. Technol., 3(4):34--43, 1995. Google Scholar
Digital Library
- Eric Schnarr and James R. Larus. Fast out-of-order processor simulation using memoization. In ASPLOS-VIII: Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, pages 283--294, New York, NY, USA, 1998. ACM Press. Google Scholar
Digital Library
- Kevin Scott and Jack Davidson. Strata: A software dynamic translation infrastructure. Technical report, Charlottesville, VA, USA, 2001. Google Scholar
Digital Library
- SESC: SuperESCalar Simulator. http://iacoma.cs.uiuc.edu/~paulsack/sescdoc/, 2002.Google Scholar
- SimpleScalar. http://www.simplescalar.com, 2004.Google Scholar
- Infineon Technologies. HYB39S512400T(L), HYB39S512800T(L), HYB39S512160T(L) 512-Mbit Synchronous DRAM Data Sheet, Rev. 1.3, 2003. http://www.infineon.com.Google Scholar
- Philippas Tsigas and Yi Zhang. A simple, fast and scalable nonblocking concurrent fifo queue for shared memory multiprocessor systems. In SPAA ?01: Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures, pages 134--143, New York, NY, USA, 2001. ACM. Google Scholar
Digital Library
- Steven Wallace and Kim Hazelwood. Superpin: Parallelizing dynamic instrumentation for real-time performance. In CGO ?07: Proceedings of the International Symposium on Code Generation and Optimization, pages 209--220, Washington, DC, USA, 2007. IEEE Computer Society. Google Scholar
Digital Library
- Emmett Witchel and Mendel Rosenblum. Embra: fast and flexible machine simulation. In SIGMETRICS ?96: Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, pages 68--79, New York, NY, USA, 1996. ACM Press. Google Scholar
Digital Library
- Ji Zhang, Jaejin Lee, and Philip K. McKinley. Optimizing the java piped i/o stream library for performance. In LCPC ?02: Proceedings of the 15th International Workshop on Languages and Compilers for Parallel Computing, pages 233--248, Berlin/Heidelberg, Germany, July 2002. Springer. Also published in Springer Lecture Notes in Computer Science, Vol. 2481/2005. Google Scholar
Digital Library
- Jianwen Zhu and Daniel D. Gajski. A retargetable, ultra-fast instruction set simulator. In DATE ?99: Proceedings of the conference on Design, automation and test in Europe, page 62, New York, NY, USA, 1999. ACM Press. Google Scholar
Digital Library
Index Terms
FaCSim: a fast and cycle-accurate architecture simulator for embedded systems
Recommendations
FaCSim: a fast and cycle-accurate architecture simulator for embedded systems
LCTES '08There have been strong demands for a fast and cycle-accurate virtual platforms in the embedded systems area where developers can do meaningful software development including performance debugging in the context of the entire platform. In this paper, we ...
Exploring many-core architecture design space for parallel discrete event simulation
SIGSIM PADS '14: Proceedings of the 2nd ACM SIGSIM Conference on Principles of Advanced Discrete SimulationAs multicore and manycore processor architectures are emerging and the core counts per chip continue to increase, it is important to evaluate and understand the performance and scalability of Parallel Discrete Event Simulation (PDES) on these platforms. ...
Simple and fast micro-architecture simulation: a trisection cantor fractal approach
Due to the prohibitively long time when detailedly simulating a realistic benchmark to its completion, sampling is frequently used to reduce the simulation time. However, it may often require profiling or iterative simulations to determine the sampling ...







Comments