skip to main content
article

Efficiently exploring architectural design spaces via predictive modeling

Published:20 October 2006Publication History
Skip Abstract Section

Abstract

Architects use cycle-by-cycle simulation to evaluate design choices and understand tradeoffs and interactions among design parameters. Efficiently exploring exponential-size design spaces with many interacting parameters remains an open problem: the sheer number of experiments renders detailed simulation intractable. We attack this problem via an automated approach that builds accurate, confident predictive design-space models. We simulate sampled points, using the results to teach our models the function describing relationships among design parameters. The models produce highly accurate performance estimates for other points in the space, can be queried to predict performance impacts of architectural changes, and are very fast compared to simulation, enabling efficient discovery of tradeoffs among parameters in different regions. We validate our approach via sensitivity studies on memory hierarchy and CPU design spaces: our models generally predict IPC with only 1-2% error and reduce required simulation by two orders of magnitude. We also show the efficacy of our technique for exploring chip multiprocessor (CMP) design spaces: when trained on a 1% sample drawn from a CMP design space with 250K points and up to 55x performance swings among different system configurations, our models predict performance with only 4-5% error on average. Our approach combines with techniques to reduce time per simulation, achieving net time savings of three-four orders of magnitude.

References

  1. G. Cai, K. Chow, T. Nakanishi, J. Hall, and M. Barany. Multivariate power/performance analysis for high performance mobile microprocessor design. In Power Driven Microarchitecture Workshop, June 1998.Google ScholarGoogle Scholar
  2. R. Caruana, S. Lawrence, and C. Giles. Overfitting in neural nets: Backpropagation, conjugate gradient, and early stopping. In Proc. Neural Information Processing Systems Conference, pages 402--408, Nov. 2000.Google ScholarGoogle Scholar
  3. K. Chow and J. Ding. Multivariate analysis of Pentium Pro processor. In Intel Software Developers Conference, pages 84--91, Oct. 1997.Google ScholarGoogle Scholar
  4. T. Conte, M. Hirsch, and K. Menezes. Reducing state loss for effective trace sampling of superscalar processors. In Proc. IEEE International Conference on Computer Design, pages 468--477, Oct. 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Davis, J. Laudon, and K. Olukotun. Maximizing CMP throughput with mediocre cores. In Proc.IEEE/ACM International Conference on Parallel Architectures and Compilation Techniques, pages 51--62, Oct. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. L. Eeckhout, R. Bell, Jr., B. Stougie, K. De Bosschere, and L. John. Control flow modeling in statistical simulation for accurate and efficient processor design studies. In Proc. 31st IEEE/ACM International Symposium on Computer Architecture, pages 350--361, June 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. L. Eeckhout, Y. Luo, L. John, and K. De Bosschere. BLRL: Accurate and efficient warmup for sampled processor simulation. The Computer Journal, 48(4):451--459, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. L. Eeckhout, S. Nussbaum, J. Smith, and K. De Bosschere. Statistical simulation: Adding efficiency to the computer designer's toolbox. IEEE Micro, 23(5):26--38, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. L. Eeckhout, H. Vandierendonck, and K. De Bosschere. Quantifying the impact of input data sets on program behavior and its applications. Journal of Instruction Level Parallelism, 5:http://www.jilp.org/vol5, Feb. 2003.Google ScholarGoogle Scholar
  10. S. Eyerman, L. Eeckhout, and K.D. Bosschere. The shape of the processor design space and its implications for early stage explorations. In Proc. 7th WSEAS International Conference on Automatic Control, Modeling and Simulation, pages 395--400, Mar. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. B. Fields, R. Bodick, M. Hill, and C. Newburn. Interaction cost and shotgun profiling. ACM Transactions on Architecture and Code Optimization, 1(3):272--304, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Haskins, Jr. and K. Skadron. Minimal subset evaluation: Rapid warm-up for simulated hardware state. In Proc. IEEE International Conference on Computer Design, pages 32--39, Sept. 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Haskins, Jr. and K. Skadron. Memory reference reuse latency: Accelerated sampled microarchitecture simulation. In Proc. IEEE International Symposium on Performance Analysis of Systems and Software, pages 195--203, Mar. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Verlag, 2001.Google ScholarGoogle Scholar
  15. V. Iyengar, L. Trevillyan, and P. Bose. Representative traces for processor models with infinite cache. In Proc. 2nd IEEE Symposium on High Performance Computer Architecture, pages 62--73, Feb. 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. B. Jacob. A case for studying DRAM issues at the system level. IEEE Micro, 23(4):44--56, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. P. Joseph, K. Vaswani, and M. Thazhuthaveetil. Use of linear regression models for processor performance analysis. In Proc. 12th IEEE Symposium on High Performance Computer Architecture, pages 99--108, Feb. 2006.Google ScholarGoogle ScholarCross RefCross Ref
  18. T. Karkhanis and J. Smith. A 1st-order superscalar processor model. In Proc. 31st IEEE/ACM International Symposium on Computer Architecture, pages 338--349, June 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. KleinOsowski and D. Lilja. MinneSPEC: A new SPEC benchmark workload for simulation-based computer architecture research. Computer Architecture Letters, 1, June 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. R. Kumar, V. Zyuban, and D. Tullsen. Interconnections in multi-core architectures: Understanding mechanisms, overheads and scaling. In Proc. 32nd IEEE/ACM International Symposium on Computer Architecture, pages 408--419, June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. B. Lee and D. Brooks. Accurate and efficient regression modeling for microarchitectural performance and power prediction. In Proc. 12th ACM Symposium on Architectural Support for Programming Languages and Operating Systems, Oct. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Martonosi and K. Skadron. NSF computer performance evaluation workshop http://www.princeton.edu/~mrm/nsf sim final.pdf, Dec. 2001.Google ScholarGoogle Scholar
  23. C. Marzban. A neural network for tornado diagnosis. Neural Computing and Applications, 9(2):133--141, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  24. T. Mitchell. Machine Learning. WCB/McGraw Hill, Boston, MA, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. A. Muttreja, A. Raghunathan, S. Ravi, and N. Jha. Automated energy/performance macromodeling of embedded software. In Proc. 41st ACM/IEEE Design Automation Conference, pages 99--102, June 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. A. Muttreja, A. Raghunathan, S. Ravi, and N. Jha. Hybrid simulation for embedded software energy estimation. In Proc. 42nd ACM/IEEE Design Automation Conference, pages 23--26, July 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. D. Noonburg and J. Shen. Theoretical modeling of superscalar processor performance. In Proc. IEEE/ACM 27th International Symposium on Microarchitecture, pages 53--62, Nov. 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. M. Oskin, F. Chong, and M. Farrens. HLS: Combining statistical and symbolic simulation to guide microprocessor design. In Proc. 27th IEEE/ACM International Symposium on Computer Architecture, pages 71--82, June 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. A. Phansalkar, A. Joshi, L. Eeckhout, and L. John. Measuring program similarity: Experiments with SPEC CPU benchmark suites. In Proc. IEEE International Symposium on Performance Analysis of Systems and Software, pages 10--20, Mar. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. D. Pomerleau. Knowledge-based training of artificial neural networks for autonomous robot driving. In J. Connell and S. Mahadevan, editors, Robot Learning, pages 19--43. Kluwer Academic Press, Boston, 1993.Google ScholarGoogle ScholarCross RefCross Ref
  31. V. Rapaka and D. Marculescu. Pre-characterization free, efficient power/performance analysis of embedded and general purpose software applications. In Proc. ACM/IEEE Design, Automation and Test in Europe Conference and Exposition, pages 10504--10509, Mar. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. J. Renau. SESC. http://sesc.sourceforge.net/index.html.Google ScholarGoogle Scholar
  33. M. Saar-Tsechansky and F. Provost. Active learning for class probability estimation and ranking. In Proc. 17th International Joint Conference on Artificial Intelligence, pages 911--920, Aug. 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically characterizing large scale program behavior. In Proc. 10th Symposium on Architectural Support for Programming Languages and Operating Systems, pages 45--57, Oct. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Standard Performance Evaluation Corporation. SPEC CPU benchmark suite. http://www.specbench.org/osg/cpu2000/, 2000.Google ScholarGoogle Scholar
  36. G. Tesauro. Temporal difference learning and TD-Gammon. Communications of the ACM, 38(3):58--68, Mar. 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. M. Van Biesbrouck, L. Eeckhout, and B. Calder. Efficient sampling startup for sampled processor simulation. In Proc. 1st International Conference on High Performance Embedded Architectures and Compilers, pages 47--67, Nov. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. T. Wenisch, R. Wunderlich, B. Falsafi, and J. Hoe. TurboSMARTS: Accurate microarchitecture simulation sampling in minutes. SIGMETRICS Performance Evaluation Review, 33(1):408--409, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. S.Wilton and N. Jouppi. CACTI: An enhanced cache access and cycle time model. IEEE Journal of Solid-State Circuits, 31(5):677--688, May 1996.Google ScholarGoogle ScholarCross RefCross Ref
  40. R. Wunderlich, T. Wenish, B. Falsafi, and J. Hoe. SMARTS: Accelerating microarchitecture simulation via rigorous statistical sampling. In Proc. 30th IEEE/ACM International Symposium on Computer Architecture, pages 84--95, June 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. J. Yi, D. Lilja, and D. Hawkins. A statistically-rigorous approach for improving simulation methodology. In Proc. 9th IEEE Symposium on High Performance Computer Architecture, pages 281--291, June 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Efficiently exploring architectural design spaces via predictive modeling

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!