skip to main content
research-article
Public Access

PROFET: Modeling System Performance and Energy Without Simulating the CPU

Authors Info & Claims
Published:19 June 2019Publication History
Skip Abstract Section

Abstract

The approaching end of DRAM scaling and expansion of emerging memory technologies is motivating a lot of research in future memory systems. Novel memory systems are typically explored by hardware simulators that are slow and often have a simplified or obsolete abstraction of the CPU. This study presents PROFET, an analytical model that predicts how an application's performance and energy consumption changes when it is executed on different memory systems. The model is based on instrumentation of an application execution on actual hardware, so it already takes into account CPU microarchitectural details such as the data prefetcher and out-of-order engine. PROFET is evaluated on two real platforms: Sandy Bridge-EP E5-2670 and Knights Landing Xeon Phi platforms with various memory configurations. The evaluation results show that PROFET's predictions are accurate, typically with only 2% difference from the values measured on actual hardware. We release the PROFET source code and all input data required for memory system and application profiling. The released package can be seamlessly installed and used on high-end Intel platforms.

References

  1. Arira Design. 2013. Hybrid Memory Cube Evaluation & Development Board. http://www.ariradesign.com/hmc-board.Google ScholarGoogle Scholar
  2. Yuan Chou, Brian Fahs, and Santosh Abraham. 2004. Microarchitecture Optimizations for Exploiting Memory-Level Parallelism. In Proceedings of the 31st Annual International Symposium on Computer Architecture. 76--87. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Clapp, M. Dimitrov, K. Kumar, V. Viswanathan, and T. Willhalm. 2015. Quantifying the Performance Impact of Memory Latency and Bandwidth for Big Data Workloads. In IEEE International Symposium on Workload Characterization. 213--224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Yokogawa Test & Measurement Corporation. {n.d.}. WT230 Digital Power Meter. https://cdn.tmi.yokogawa.com/IM760401-01E.pdf.Google ScholarGoogle Scholar
  5. S. Van den Steen, S. Eyerman, S. De Pestel, M. Mechri, T. E. Carlson, D. Black-Schaffer, E. Hagersten, and L. Eeckhout. 2016. Analytical Processor Performance and Power Modeling Using Micro-Architecture Independent Characteristics. IEEE Trans. Comput., Vol. 65, 12 (Dec 2016), 3537--3551. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Qingyuan Deng, David Meisner, Luiz Ramos, Thomas F. Wenisch, and Ricardo Bianchini. 2011. MemScale: Active Low-power Modes for Main Memory. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems. 225--238. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. G. Emma. 1997. Understanding some simple processor-performance limits. IBM Journal of Research and Development, Vol. 41, 3 (May 1997), 215--232. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Stijn Eyerman, Lieven Eeckhout, Tejas Karkhanis, and James E. Smith. 2006. A Performance Counter Architecture for Computing Accurate CPI Components. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems. 175--184. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Stijn Eyerman, Lieven Eeckhout, Tejas Karkhanis, and James E. Smith. 2009. A Mechanistic Performance Model for Superscalar Out-of-order Processors. ACM Trans. Comput. Syst., Vol. 27, 2 (May 2009), 3:1--3:37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Xixhou Feng, Rong Ge, and K. W. Cameron. 2005. Power and energy profiling of scientific applications on distributed systems. In IEEE International Parallel and Distributed Processing Symposium. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Genbrugge, S. Eyerman, and L. Eeckhout. 2010. Interval simulation: Raising the level of abstraction in architectural simulation. In The Sixteenth International Symposium on High-Performance Computer Architecture. 307--318.Google ScholarGoogle Scholar
  12. Andrew Glew. 1998. MLP yes! ILP no! International Conference on Architectural Support for Programming Languages and Operating Systems, Wild and Crazy Ideas Session (Oct. 1998).Google ScholarGoogle Scholar
  13. John L. Hennessy and David A. Patterson. 2017. Computer Architecture: A Quantitative Approach 6th ed.). Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Intel Corporation. 2012a. Intel® Xeon® Processor E5--1600/E5--2600/E5--4600 Product Families Datasheet - Volume One. Technical Report 326508.Google ScholarGoogle Scholar
  15. Intel Corporation. 2012b. Intel® Xeon® Processor E5--2600 Product Family Uncore Performance Monitoring Guide. Technical Report.Google ScholarGoogle Scholar
  16. Intel Corporation. 2016. Intel® 64 and IA-32 Architectures Optimization Reference Manual. Technical Report.Google ScholarGoogle Scholar
  17. Intel Corporation. 2017. Intel® Xeon Phi#8482; Processor Performance Monitoring Reference Manual - Volume 2: Events. Technical Report.Google ScholarGoogle Scholar
  18. Bruce Jacob, Spencer Ng, and David Wang. 2007. Memory Systems: Cache, DRAM, Disk. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Bruce L. Jacob. 2009. The Memory System: You Can't Avoid It, You Can't Ignore It, You Can't Fake It. Synthesis Lectures on Computer Architecture, Vol. 4, 1 (2009), 1--77. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. James Jeffers, James Reinders, and Avinash Sodani. 2016. Intel Xeon Phi Processor High Performance Programming: Knights Landing Edition 2nd ed.). Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Tejas S. Karkhanis and James E. Smith. 2004. A First-Order Superscalar Processor Model. In Proceedings of the Annual International Symposium on Computer Architecture. 338--349. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Y. Kim, W. Yang, and O. Mutlu. 2016. Ramulator: A Fast and Extensible DRAM Simulator. IEEE Computer Architecture Letters, Vol. 15, 1 (Jan. 2016), 45--49. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Peter Kogge, Keren Bergman, Shekhar Borkar, Dan Campbell, William Carlson, William Dally, Monty Denneau, Paul Franzon, William Harrod, Kerry Hill, Jon Hiller, Sherman Karp, Stephen Keckler, Dean Klein, Robert Lucas, Mark Richards, Al Scarpelli, Steven Scott, Allan Snavely, Thomas Sterling, R. Stanley Williams, and Katherine Yelick. 2008. ExaScale Computing Study: Technology Challenges in Achieving Exascale Systems .Google ScholarGoogle Scholar
  24. David Kroft. 1981. Lockup-free Instruction Fetch/Prefetch Cache Organization. In Proceedings of the Annual Symposium on Computer Architecture. 81--87. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Sheng Li, Jung Ho Ahn, Richard D. Strong, Jay B. Brockman, Dean M. Tullsen, and Norman P. Jouppi. 2009. McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture. 469--480. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. John D. McCalpin. 1991--2007. STREAM: Sustainable Memory Bandwidth in High Performance Computers. Technical Report. University of Virginia. http://www.cs.virginia.edu/stream/Google ScholarGoogle Scholar
  27. Micron Technology, Inc. 2007. Calculating Memory System Power for DDR3. Technical Report TN-41-01.Google ScholarGoogle Scholar
  28. Micron Technology, Inc. 2013. MT36JSF1G72PZ-1G6M1, 8GB (x72, ECC, DR) 240-Pin DDR3 RDIMM. http://www.micron.com/~/media/documents/products/data-sheet/modules/parity_rdimm/jsf36c1gx72pz.pdf.Google ScholarGoogle Scholar
  29. Partnership for Advanced Computing in Europe (PRACE). 2013. Unified European Applications Benchmark Suite. www.prace-ri.eu/ueabs/.Google ScholarGoogle Scholar
  30. Milan Radulovic, Rommel Sanchez Verdejo, Paul Carpenter, Petar Radojkoviç, Bruce Jacob, and Eduard Ayguadé. 2019. PROFET -- Analytical model that quantifies the impact of the main memory on application performance and system power and energy consumption. https://github.com/bsc-mem/PROFET.Google ScholarGoogle Scholar
  31. P. Rosenfeld, E. Cooper-Balis, and B. Jacob. 2011. DRAMSim2: A Cycle Accurate Memory System Simulator. IEEE Computer Architecture Letters, Vol. 10, 1 (Jan. 2011), 16--19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Daniel Sanchez and Christos Kozyrakis. 2013. ZSim: Fast and Accurate Microarchitectural Simulation of Thousand-core Systems. In Proceedings of the 40th Annual International Symposium on Computer Architecture. 475--486.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Rommel Sanchez Verdejo, Kazi Asifuzzaman, Milan Radulovic, Petar Radojkoviç, Eduard Ayguadé, and Bruce Jacob. 2018. Main Memory Latency Simulation: The Missing Link. In Proceedings of the International Symposium on Memory Systems. 1--9.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Avinash Sodani. 2011. Race to Exascale: Opportunities and Challenges. Keynote Presentation at the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).Google ScholarGoogle Scholar
  35. A. Sodani, R. Gramunt, J. Corbal, H. S. Kim, K. Vinod, S. Chinthamani, S. Hutsell, R. Agarwal, and Y. C. Liu. 2016. Knights Landing: Second-Generation Intel Xeon Phi Product. IEEE Micro, Vol. 36, 2 (March 2016), 34--46. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Standard Performance Evaluation Corporation. {n.d.}. SPEC CPU 2006. http://www.spec.org/cpu2006/.Google ScholarGoogle Scholar
  37. Rick Stevens, Andy White, Pete Beckman, Ray Bair-ANL, Jim Hack, Jeff Nichols, Al GeistORNL, Horst Simon, Kathy Yelick, John Shalf-LBNL, Steve Ashby, Moe Khaleel-PNNL, Michel McCoy, Mark Seager, Brent Gorda-LLNL, John Morrison, Cheryl Wampler-LANL, James Peery, Sudip Dosanjh, Jim Ang-SNL, Jim Davenport, Tom Schlagel, BNL, Fred Johnson, and Paul Messina. 2010. A Decadal DOE Plan for Providing Exascale Applications and Technologies for DOE Mission Needs. Presentation at Advanced Simulation and Computing Principal Investigators Meeting.Google ScholarGoogle Scholar
  38. J. Treibig, G. Hager, and G. Wellein. 2010. LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments. In International Conference on Parallel Processing Workshops. 207--216. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. R. S. Verdejo and P. Radojković. 2017. Microbenchmarks for Detailed Validation and Tuning of Hardware Simulators. In 2017 International Conference on High Performance Computing Simulation (HPCS). 881--883.Google ScholarGoogle Scholar
  40. Wm. A. Wulf and Sally A. McKee. 1995. Hitting the Memory Wall: Implications of the Obvious. ACM SIGARCH Computer Architecture News, Vol. 23, 1 (March 1995), 20--24. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. PROFET: Modeling System Performance and Energy Without Simulating the CPU

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!