skip to main content
research-article

Multilevel Phase Analysis

Authors Info & Claims
Published:09 March 2015Publication History
Skip Abstract Section

Abstract

Phase analysis, which classifies the set of execution intervals with similar execution behavior and resource requirements, has been widely used in a variety of systems, including dynamic cache reconfiguration, prefetching, race detection, and sampling simulation. Although phase granularity has been a major factor in the accuracy of phase analysis, it has not been well investigated, and most systems usually adopt a fine-grained scheme. However, such a scheme can only take account of recent local phase information and could be frequently interfered by temporary noise due to instant phase changes, which might notably limit the accuracy.

In this article, we make the first investigation on the potential of multilevel phase analysis (MLPA), where different granularity phase analyses are combined together to improve the overall accuracy. The key observation is that the coarse-grained intervals belonging to the same phase usually consist of stably distributed fine-grained phases. Moreover, the phase of a coarse-grained interval can be accurately identified based on the fine-grained intervals at the beginning of its execution. Based on the observation, we design and implement an MLPA scheme. In such a scheme, a coarse-grained phase is first identified based on the fine-grained intervals at the beginning of its execution. The following fine-grained phases in it are then predicted based on the sequence of fine-grained phases in the coarse-grained phase. Experimental results show that such a scheme can notably improve the prediction accuracy. Using a Markov fine-grained phase predictor as the baseline, MLPA can improve prediction accuracy by 20%, 39%, and 29% for next phase, phase change, and phase length prediction for SPEC2000, respectively, yet incur only about 2% time overhead and 40% space overhead (about 360 bytes in total). To demonstrate the effectiveness of MLPA, we apply it to a dynamic cache reconfiguration system that dynamically adjusts the cache size to reduce the power consumption and access time of the data cache. Experimental results show that MLPA can further reduce the average cache size by 15% compared to the fine-grained scheme.

Moreover, for MLPA, we also observe that coarse-grained phases can better capture the overall program characteristics with fewer of phases and the last representative phase could be classified in a very early program position, leading to fewer execution internals being functionally simulated. Based on this observation, we also design a multilevel sampling simulation technique that combines both fine- and coarse-grained phase analysis for sampling simulation. Such a scheme uses fine-grained simulation points to represent only the selected coarse-grained simulation points instead of the entire program execution; thus, it could further reduce both the functional and detailed simulation time. Experimental results show that MLPA for sampling simulation can achieve a speedup in simulation time of about 8.3X with similar accuracy compared to 10M SimPoint.

References

  1. Rajeev Balasubramonian, David H. Albonesi, Alper Buyuktosunoglu, and Sandhya Dwarkadas. 2000. Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture. 245--257. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Shekhar Borkar. 2001. Low power design challenges for the decade (invited talk). In Proceedings of the 2001 Asia and South Pacific Design Automation Conference. ACM, New York, NY, 293--296. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Doug Burger and Todd M. Austin. 1997. The SimpleScalar Tool Set, Version 2.0. Technical Report 1342. Computer Sciences Department, University of Wisconsin, Madison, WI.Google ScholarGoogle Scholar
  4. Trevor E. Carlson, Wim Heirman, and Lieven Eeckhout. 2013. Sampled simulation of multi-threaded applications. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 2--12.Google ScholarGoogle ScholarCross RefCross Ref
  5. I-Cheng K. Chen, John T. Coffey, and Trevor N. Mudge. 1996. Analysis of branch prediction via data compression. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems. 128--137. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Chang-Burm Cho and Tao Li. 2006. Complexity-based program phase analysis and classification. In Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques. 105--113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Inseok Choi and Donald Yeung. 2013. Symbiotic Cache Resizing for CMPs with Shared LLC. Technical Report UMIACS-TR-2013. University of Maryland, College Park, MD.Google ScholarGoogle Scholar
  8. Peter J. Denning and Stuart C. Schwartz. 1972. Properties of the working-set model. Communications of the ACM 15, 3, 191--198. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Ashutosh S. Dhodapkar and James E. Smith. 2002. Managing multi-configuration hardware via dynamic working set analysis. In Proceedings of the International Symposium on Computer Architecture. 233--244. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Ashutosh S. Dhodapkar and James E. Smith. 2003. Comparing program phase detection techniques. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture. 217. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Evelyn Duesterwald, Calin Cascaval, and Sandhya Dwarkadas. 2003. Characterizing and predicting program behavior and its variability. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques. 220. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Davy Genbrugge, Stijn Eyerman, and Lieven Eeckhout. 2010. Interval simulation: Raising the level of abstraction in architectural simulation. In Proceedings of the 16th IEEE International Symposium on High-Performance Computer Architecture. 1--12.Google ScholarGoogle ScholarCross RefCross Ref
  13. Andy Georges, Dries Buytaert, Lieven Eeckhout, and Koen De Bosschere. 2004. Method-level phase behavior in Java workloads. In Proceedings of the 19th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications. 270--287. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Ann Gordon-Ross, Frank Vahid, and Nikil Dutt. 2004. Automatic tuning of two-level caches to embedded applications. In Proceedings of the Conference on Design, Automation, and Test in Europe—Volume 1. IEEE, Los Alamitos, CA, 10208. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Michael J. Hind, Vadakkedathu T. Rajan, and Peter F. Sweeney. 2003. Phase Shift Detection: A Problem Classification. Technical Report. IBM, Armonk, NY.Google ScholarGoogle Scholar
  16. Michael Huang, Jose Renau, and Josep Torrellas. 2003. Positional adaptation of processors: Application to energy reduction. In Proceedings of the 30th Annual International Symposium on Computer Architecture. 157--168. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Ted Huffmire and Tim Sherwood. 2006. Wavelet-based phase classification. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques. 95--104. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Canturk Isci and Margaret Martonosi. 2003. Runtime power monitoring in high-end processors: Methodology and empirical data. In Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture. 93. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Canturk Isci and Margaret Martonosi. 2006. Phase characterization for power: Evaluating control-flow-based and event-counter-based techniques. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture. 133--144.Google ScholarGoogle ScholarCross RefCross Ref
  20. Richard A. Johnson and Dean A. Wichern. 2002. Applied Multivariate Statistical Analysis (5th ed.). Prentice Hall.Google ScholarGoogle Scholar
  21. Doug Joseph and Dirk Grunwald. 1997. Prefetching using Markov predictors. In Proceedings of the 2nd Annual International Symposium on Computer Architecture. 252--263. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Jeremy Lau, Erez Perelman, and Brad Calder. 2006. Selecting software phase markers with code structure analysis. In Proceedings of the International Symposium on Code Generation and Optimization. 135--146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Jeremy Lau, Erez Perelman, Greg Hamerly, Timothy Sherwood, and Brad Calder. 2005a. Motivation for variable length intervals and hierarchical phase behavior. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software. 135--146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Jeremy Lau, Stefan Schoenmackers, and Brad Calder. 2004. Structures for phase classification. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software. 57--67. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Jeremy Lau, Stefan Schoenmackers, and Brad Calder. 2005b. Transition phase classification and prediction. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture. 278--289. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Jiwei Lu, Howard Chen, Rao Fu, Wei-Chung Hsu, Bobbie Othmer, Pen-Chung Yew, and Dong-Yuan Chen. 2003. The performance of runtime data cache prefetching in a dynamic optimization system. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture. 180--190. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. James MacQueen. 1967. Some methods for classification and analysis of multivariate observations. In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability. 281--297.Google ScholarGoogle Scholar
  28. Afzal Malik, Bill Moyer, and Dan Cermak. 2000. A low power unified cache architecture providing power and performance flexibility. In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED). IEEE, Los Alamitos, CA, 241--243. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Daniel Marino, Madanlal Musuvathi, and Satish Narayanasamy. 2009. LiteRace: Effective sampling for lightweight data-race detection. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. 134--143. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Mehdi Modarressi, Shaahin Hessabi, and Maziar Goudarzi. 2006. A reconfigurable cache architecture for object-oriented embedded systems. In Proceedings of the Canadian Conference on Electrical and Computer Engineering (CCECE). IEEE, Los Alamitos, CA, 959--962.Google ScholarGoogle ScholarCross RefCross Ref
  31. Arun A. Nair and Lizy Joh. 2008. Simulation points for SPEC 2006. In Proceedings of the IEEE International Conference on Computer Design. 38--46.Google ScholarGoogle Scholar
  32. Erez Perelman, Greg Hamerly, and Brad Calder. 2003. Picking statistically valid and early simulation points. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques. 244. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Erez Perelman, Marzia Polito, Jean-Yves Bouguet, John Sampson, Brad Calder, and Carole Dulongh. 2006. Detecting phases in parallel applications on shared memory architectures. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium. 148--157. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Aashish Phansalkar, Ajay Joshi, Lieven Eeckhout, and Lizy Kurian John. 2005. Measuring program similarity: Experiments with SPEC CPU benchmark suites. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software. 10--20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Alex Settle, Dan Connors, Enric Gibert, and Antonio Gonzalez. 2006. A dynamically reconfigurable cache for multithreaded processors. Journal of Embedded Computing 2, 2, 221--233. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Xipeng Shen, Yutao Zhong, and Chen Ding. 2004. Locality phase prediction. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems. 165--176. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Timothy Sherwood, Erez Perelman, Greg Hamerly, and Brad Calder. 2002. Automatically characterizing large scale program behavior. In Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems. 45--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Timothy Sherwood, Suleyman Sair, and Brad Calder. 2003. Phase tracking and prediction. In Proceedings of the International Symposium on Computer Architecture. 336--349. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Michael Van Biesbrouck, Lieven Eeckhout, and Brad Calder. 2006. Considering all starting points for simultaneous multithreading simulation. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software. IEEE, Los Alamitos, CA, 143--153.Google ScholarGoogle ScholarCross RefCross Ref
  40. Michael Van Biesbrouck, Timothy Sherwood, and Brad Calder. 2004. A co-phase matrix to guide simultaneous multithreading simulation. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 45--56. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Kenzo Van Craeynest and Lieven Eeckhout. 2011. The multi-program performance model: Debunking current practice in multi-core simulation. In Proceedings of the IEEE International Symposium on Workload Characterization. 26--37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Thomas Wenisch, Roland Wunderlich, Babak Falsafi, and James Hoe. 2006. Simulation sampling with live-points. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software. 2--12.Google ScholarGoogle ScholarCross RefCross Ref
  43. Roland E. Wunderlich, Thomas F. Wenisch, Babak Falsafi, and James C. Hoe. 2003. SMARTS: Accelerating microarchitecture simulation via rigorous statistical sampling. In Proceedings of the 30th Annual International Symposium on Computer Architecture. 84--97. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Joshua J. Yi, Sreekumar V. Kodakara, Resit Sendag, David J. Lilja, and Douglas M. Hawkins. 2005. Characterizing and comparing prevailing simulation techniques. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture. 266--277. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Chuanjun Zhang, Frank Vahid, and Walid Najjar. 2005. A highly configurable cache for low energy embedded systems. ACM Transactions on Embedded Computing Systems 4, 2, 363--387. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Multilevel Phase Analysis

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!