Abstract
Despite the recent progress in improving the speed of instruction-accurate simulators cycle-accurate simulation is still prohibitively slow for all but the most basic programs. In this article we present a statistical machine learning approach to performance estimation in fast, instruction accurate simulators and evaluate our methodology comprehensively against three popular embedded RISC processors and about 300 embedded applications. We show that our methodology is capable of providing accurate performance estimations with an average error of less than 3.9% while, on average, operating ≈ 14.5 times faster than cycle-accurate simulation.
- Amarasinghe, S. 2007. StreamIt---benchmarks. http://cag.csail.mit.edu/streamit/shtml/benchmarks.shtml.Google Scholar
- Apple, Inc. 2007. Apple CHUD tools. http://www.apple.com.Google Scholar
- ARC International. 2007a. ARC 700 core family brochure. http://www.arc.com.Google Scholar
- ARC International. 2007b. ARC VTOC tool. http://www.arc.com/software/simulation/vtoc.html.Google Scholar
- Austin, T. M. 2007. Pointer-intensive benchmark suite. http://www.cs.wisc.edu/~austin/ptr-dist.html.Google Scholar
- Austin, T. M., Breach, S. E., and Sohi, G. S. 1993. Efficient detection of all pointer and array access errors. Tech. rep., University of Wisconsin.Google Scholar
- Bammi, J. R., Kruijtzer, W., Lavagno, L., Harcourt, E., and Lazarescu, M. T. 2000. Software performance estimation strategies in a system-level design tool. In Proceedings of the 8th International Workshop on Hardware/Software Codesign (CODES’00). ACM, New York, NY, USA, 82--86. Google Scholar
Digital Library
- Bontempi, G. and Kruijtzer, W. 2002. A data analysis method for software performance prediction. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE’02). IEEE Computer Society, Los Alamitos, CA, 971. Google Scholar
Digital Library
- De Bus, B., De Sutter, B., Van Put, L., Chanet, D., and De Bosschere, K. 2004. Link-time optimization of ARM binaries. In Proceedings of the ACM SIGPLAN Joint Conference on Languages, Compilers and Tools for Embedded Systems (LCTES’04). ACM Press, 211--220. Google Scholar
Digital Library
- Diniz, P. C. 2003. A compiler approach to performance prediction using empirical-based modeling. In Proceedings of the International Conference on Computational Science. Lecture Notes in Computer Science, vol. 2659, 916--925. Google Scholar
Digital Library
- Dubach, C., Cavazos, J., Franke, B., Fursin, G., O’Boyle, M. F., and Temam, O. 2007. Fast compiler optimisation evaluation using code-feature based performance prediction. In Proceedings of the 4th International Conference on Computing Frontiers (CF’07). ACM, New York, NY, 131--142. Google Scholar
Digital Library
- Eeckhout, L. and Bosschere, K. D. 2001. Hybrid analytical-statistical modeling for efficiently exploring architecture and workload design spaces. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT’01). IEEE Computer Society, Los Alamitos, CA, 25. Google Scholar
Digital Library
- Eeckhout, L., Nussbaum, S., Smith, J. E., and Bosschere, K. D. 2003. Statistical simulation: Adding efficiency to the computer designer’s toolbox. IEEE Micro. 23, 5, 26--38. Google Scholar
Digital Library
- EEMBC. 2008. EEMBC benchmarks. http://www.eembc.org.Google Scholar
- Franke, B. 2008. Fast cycle-approximate instruction set simulation. In Proceedings of the 11th International Workshop on Software & Compilers for Embedded Systems (SCOPES’’08). ACM, New York, NY, USA, 69--78. Google Scholar
Cross Ref
- Freescale Semiconductor, Inc. 2007a. MPC 7410 RISC microprocessor hardware specification. http://www.freescale.com.Google Scholar
- Freescale Semiconductor, Inc. 2007b. SimG4 timing model. http://www.freescale.com.Google Scholar
- Hamerly, G., Perelman, E., Lau, J., and Calder, B. 2005. SimPoint 3.0: Faster and more flexible program analysis. J. Instr.-Level Paral. 7, 1--28.Google Scholar
- Hoffmann, A., Meyr, H., and Leupers, R. 2002. Architecture Exploration for Embedded Processors with Lisa. Kluwer Academic Publishers, Norwell, MA. Google Scholar
Digital Library
- Hoste, K., Phansalkar, A., Eeckhout, L., Georges, A., John, L. K., and De Bosschere, K. 2006. Performance prediction based on inherent program similarity. In Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques (PACT’06). ACM, New York, NY, 114--122. Google Scholar
Digital Library
- Hsu, C.-H. and Kremer, U. 1998. Iperf: A framework for automatic construction of performance prediction models. In Proceedings of the Workshop on Profile and Feedback-Directed Compilation (PFDC’98).Google Scholar
- Intel Corporation. 2000. Intel StrongARM SA-1110 microprocessor---Developers manual. http://www.intel.com.Google Scholar
- Joseph, P., Vaswani, K., and Thazhuthaveetil, M. J. 2006. Construction and use of linear regression models for processor performance analysis. In Proceedings of the 12th International Symposium on High-Performance Computer Architecture (HPCA’06). IEEE, 99--108.Google Scholar
- Kempf, T., Karuri, K., Wallentowitz, S., Ascheid, G., Leupers, R., and Meyr, H. 2006. A sw performance estimation framework for early system-level-design using fine-grained instrumentation. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE’06). 468--473. Google Scholar
Digital Library
- Lee, B. C. and Brooks, D. M. 2006. Accurate and efficient regression modeling for microarchitectural performance and power prediction. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, New York, NY, 185--194. Google Scholar
Digital Library
- Lee, C. 2007. MediaBench. http://euler.slu.edu/~fritts/mediabench/mb1/.Google Scholar
- Lee, C., Potkonjak, M., and Mangione-Smith, W. 1997. MediaBench: A tool for evaluating and synthesizing multimedia and communications systems. In Proceedings of the 30th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society. Google Scholar
Digital Library
- Lee, C. G. 1998. UTDSP benchmark suite. http://www.eecg.toronto.edu/~corinna/DSP/infrastructure/UTDSP.html.Google Scholar
- Nohl, A., Braun, G., Schliebusch, O., Leupers, R., Meyr, H., and Hoffmann, A. 2002. A universal technique for fast and flexible instruction-set architecture simulation. In Proceedings of the 39th Conference on Design Automation (DAC’02). ACM, New York, NY, 22--27. Google Scholar
Digital Library
- Oyamada, M. S., Zschornack, F., and Wagner, F. R. 2004. Accurate software performance estimation using domain classification and neural networks. In Proceedings of the 17th Symposium on Integrated Circuits and System Design (SBCCI’04). ACM, New York, NY, 175--180. Google Scholar
Digital Library
- Peng, H., Long, F., and Ding, C. 2005. Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Patt. Anal. Mach. Intell. 27, 8, 1226--1238. Google Scholar
Digital Library
- Powell, D. C. and Franke, B. 2009. Using continuous statistical machine learning to enable high-speed performance prediction in hybrid instruction-/cycle-accurate instruction set simulators. In Proceedings of the 7th IEEE/ACM International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’09). ACM, New York, NY, 315--324. Google Scholar
Digital Library
- Qin, W. 2007. SimIt-ARM. http://simit-arm.sourceforge.net.Google Scholar
- Reshadi, M., Bansal, N., Mishra, P., and Dutt, N. 2003. An efficient retargetable framework for instruction-set simulation. In Proceedings of the 1st IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’03). ACM, New York, NY, 13--18. Google Scholar
Digital Library
- Reshadi, M., Mishra, P., and Dutt, N. 2003. Instruction set compiled simulation: A technique for fast and flexible instruction set simulation. In Proceedings of the Conference on Design Automation. Google Scholar
Digital Library
- Reshadi, M., Mishra, P., and Dutt, N. 2009. Hybrid-compiled simulation: An efficient technique for instruction-set architecture simulation. ACM Trans. Embed. Comput. Syst. 8, 3, 1--27. Google Scholar
Digital Library
- Schwaighofer, A. and Tresp, V. 2003. Transductive and inductive methods for approximate Gaussian Process regression. In Advances in Neural Information Processing Systems vol. 15, S. T. S. Becker and K. Obermayer Eds., MIT Press, Cambridge, MA, 953--960.Google Scholar
- Snyder, W., Wasson, P., and Galbi, D. 2007. Verilator. http://www.veripool.com/verilator.html.Google Scholar
- Tan, L. 2006. The worst case execution time tool challenge 2006: The external test. Tech. rep., University of Duisburg-Essen, Los Alamitos, CA.Google Scholar
- Topham, N. and Jones, D. 2007. High speed CPU simulation using JIT binary translation. In Proceedings of the Workshop on Modeling, Benchmarking and Simulation (MoBS).Google Scholar
- Wand, M. and Jones, M. 1995. Monographs on Statistics and Applied Probability, vol. 60, Chapman and Hall, London.Google Scholar
- Weber, S. J., Moskewicz, M. W., Gries, M., Sauer, C., and Keutzer, K. 2004. Fast cycle-accurate simulation and instruction set generation for constraint-based descriptions of programmable architectures. In Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’04). IEEE Computer Society, Los Alamitos, CA, 18--23. Google Scholar
Digital Library
- Williams, C. and Rasmussen, C. 1996. Advances in Neural Information Processing Systems. Vol. 8. MIT Press, Cambridge, MA, 514--520.Google Scholar
- Wunderlich, R. E., Wenisch, T. F., Falsafi, B., and Hoe, J. C. 2003. SMARTS: Accelerating microarchitecture simulation via rigorous statistical sampling. In Proceedings of the 30th Annual International Symposium on Computer Architecture (ISCA’03). ACM, New York, NY, 84--97. Google Scholar
Digital Library
- z̆ivojnović, V., Velarde, J. M., Schläger, C., and Meyr, H. 1994. DSPSTONE: A DSP-oriented benchmarking methodology. In Proceedings of the International Conference on Signal Processing and Technology (ICSPAT’94).Google Scholar
Index Terms
Statistical Performance Modeling in Functional Instruction Set Simulators
Recommendations
Using continuous statistical machine learning to enable high-speed performance prediction in hybrid instruction-/cycle-accurate instruction set simulators
CODES+ISSS '09: Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesisFunctional instruction set simulators perform instruction-accurate simulation of benchmarks at high instruction rates. Unlike their slower, but cycle-accurate counterparts however, they are not capable of providing cycle counts due to the higher level ...
A multiprocessing approach to accelerate retargetable and portable dynamic-compiled instruction-set simulation
CODES+ISSS '06: Proceedings of the 4th international conference on Hardware/software codesign and system synthesisTraditionally, instruction-set simulators (ISS's) are sequential programs running on individual processors. Besides the advances of simulation techniques, ISS's have been mainly driven by the continuously improving performance of single processors. ...
Instruction set compiled simulation: a technique for fast and flexible instruction set simulation
DAC '03: Proceedings of the 40th annual Design Automation ConferenceInstruction set simulators are critical tools for the exploration and validation of new programmable architectures. Due to increasing complexity of the architectures and time-to-market pressure, performance is the most important feature of an ...






Comments