skip to main content
research-article

EOLE: Combining Static and Dynamic Scheduling Through Value Prediction to Reduce Complexity and Increase Performance

Published:21 April 2016Publication History
Skip Abstract Section

Abstract

Recent work in the field of value prediction (VP) has shown that given an efficient confidence estimation mechanism, prediction validation could be removed from the out-of-order engine and delayed until commit time. As a result, a simple recovery mechanism—pipeline squashing—can be used, whereas the out-of-order engine remains mostly unmodified.

Yet, VP and validation at commit time require additional ports on the physical register file, potentially rendering the overall number of ports unbearable. Fortunately, VP also implies that many single-cycle ALU instructions have their operands predicted in the front-end and can be executed in-place, in-order. Similarly, the execution of single-cycle instructions whose result has been predicted can be delayed until commit time since predictions are validated at commit time.

Consequently, a significant number of instructions—10% to 70% in our experiments—can bypass the out-of-order engine, allowing for a reduction of the issue width. This reduction paves the way for a truly practical implementation of VP. Furthermore, since VP in itself usually increases performance, our resulting {Early—Out-of-Order—Late} Execution architecture, EOLE, is often more efficient than a baseline VP-augmented 6-issue superscalar while having a significantly narrower 4-issue out-of-order engine.

References

  1. P. S. Ahuja, D. W. Clark, and A. Rogers. 1995. The performance impact of incomplete bypassing in processor pipelines. In Proceedings of the International Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. T. M. Austin. 1999. DIVA: A reliable substrate for deep submicron microarchitecture design. In Proceedings of the International Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood. 2011. The gem5 simulator. SIGARCH Computer Architecture News 39, 2, 1--7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. G. Z. Chrysos and J. S. Emer. 1998. Memory dependence prediction using store sets. In Proceedings of the International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. J. Eickemeyer and S. Vassiliadis. 1993. A load-instruction unit for pipelined processors. IBM Journal of Research and Development 37, 4, 547--564. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. Ernst and T. Austin. 2002. Efficient dynamic scheduling through tag elimination. In Proceedings of the International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. B. Fahs, T. Rafacz, S. J. Patel, and S. S. Lumetta. 2005. Continuous optimization. In Proceedings of the International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. K. I. Farkas, P. Chow, N. P. Jouppi, and Z. Vranesic. 1997. The multicluster architecture: Reducing cycle time through partitioning. In Proceedings of the International Symposium on Microarchitecture. 11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. B. Fields, S. Rubin, and R. Bodík. 2001. Focusing processor policies via critical-path prediction. In Proceedings of the International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. F. Gabbay and A. Mendelson. 1998. Using value prediction to increase the power of speculative execution hardware. ACM Transactions on Computer Systems 16, 3, 234--270. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Gochman, R. Ronen, I. Anati, A. Berkovits, T. Kurts, A. Naveh, A. Saeed, Z. Sperber, and R. C. Valentine. 2003. The Intel Pentium M processor: Microarchitecture and performance. Intel Technology Journal 7, 2, 21--36.Google ScholarGoogle Scholar
  12. B. Goeman, H. Vandierendonck, and K. De Bosschere. 2001. Differential FCM: Increasing value prediction accuracy by improving table usage efficiency. In Proceedings of the International Conference on High-Performance Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Intel. 2007. Intel 64 Architecture Memory Ordering White Paper. Retrieved February 21, 2016, from http://www.cs.cmu.edu/∼410/doc/Intel_Reordering_318147.pdf.Google ScholarGoogle Scholar
  14. Intel. 2013. Intel 64 and IA-32 Architectures Software Developer’s Manual. Available at http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf.Google ScholarGoogle Scholar
  15. Intel. 2014. Software Optimization Manual. Retrieved February 21, 2016, from http://www..fr/content/www/fr/fr/architecture-and-technology/64-ia-32-arc hitectures-optimization-manual.html.Google ScholarGoogle Scholar
  16. S. Jourdan, R. Ronen, M. Bekerman, B. Shomar, and A. Yoaz. 1998. A novel renaming scheme to exploit value temporal locality through physical register reuse and unification. In Proceedings of the International Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. E. Kessler, E. J. Mclellan, and D. A. Webb. 1998. The Alpha 21264 microprocessor architecture. In Proceedings of the International Conference on Computer Design. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. I. Kim and M. H. Lipasti. 2003. Half-price architecture. In Proceedings of the International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. I. Kim and M. H. Lipasti. 2004. Understanding scheduling replay schemes. In Proceedings of the International Symposium on High-Performance Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. H. Lipasti and J. P. Shen. 1996. Exceeding the dataflow limit via value prediction. In Proceedings of the Annual International Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. H. Lipasti, C. B. Wilkerson, and J. P. Shen. 1996. Value locality and load value prediction. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. Lukefahr, S. Padmanabha, R. Das, F. M. Sleiman, R. Dreslinski, T. F. Wenisch, and S. Mahlke. 2012. Composite cores: Pushing heterogeneity into a core. In Proceedings of the International Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. Mendelson and F. Gabbay. 1997. Speculative Execution Based on Value Prediction. Technical Report TR1080. Technion-Israel Institute of Technology.Google ScholarGoogle Scholar
  24. T. Nakra, R. Gupta, and M. L. Soffa. 1999. Global context-based value prediction. In Proceedings of the International Symposium on High-Performance Computer Architecture. 4--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. Palacharla, N. P. Jouppi, and J. E. Smith. 1997. Complexity-effective superscalar processors. In Proceedings of the International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. A. Perais and A. Seznec. 2014a. EOLE: Paving the way for an effective implementation of value prediction. In Proceedings of the International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. A. Perais and A. Seznec. 2014b. Practical data value speculation for future high-end processors. In Proceedings of the International Symposium on High-Performance Computer Architecture.Google ScholarGoogle Scholar
  28. A. Perais and A. Seznec. 2015. BeBoP: A cost effective predictor infrastructure for superscalar value prediction. In Proceedings of the International Symposium on High-Performance Computer Architecture.Google ScholarGoogle Scholar
  29. A. Perais, A. Seznec, P. Michaud, A. Sembrant, and E. Hagersten. 2015. Cost-effective speculative scheduling in high performance processors. In Proceedings of the International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. E. Perelman, G. Hamerly, and B. Calder. 2003. Picking statistically valid and early simulation points. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. V. Petric, T. Sha, and A. Roth. 2005. RENO: A rename-based instruction optimizer. In Proceedings of the International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. B. Rychlik, J. W. Faistl, B. P. Krug, A. Y. Kurland, J. J. Sung, M. N. Velev, and J. P. Shen. 1998. Efficient and Accurate Value Prediction Using Dynamic Classification. CMμART-1998-01. Carnegie Mellon University, Pittsburgh, PA.Google ScholarGoogle Scholar
  33. Y. Sazeides and J. E. Smith. 1997. The predictability of data values. In Proceedings of the International Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. A. Seznec. 2011. Storage free confidence estimation for the TAGE branch predictor. In Proceedings of the International Symposium on High-Performance Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. A. Seznec and P. Michaud. 2006. A case for (partially) TAgged GEometric history length branch prediction. Journal of Instruction Level Parallelism 8, 1--23.Google ScholarGoogle Scholar
  36. A. Seznec, E. Toullec, and O. Rochecouste. 2002. Register write specialization register read specialization: A path to complexity-effective wide-issue superscalar processors. In Proceedings of the International Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Standard Performance Evaluation Corporation. 2000. SPEC CPU 2000 V1.3. Retrieved February 21, 2016, from http://www.spec.org/cpu2000/.Google ScholarGoogle Scholar
  38. Standard Performance Evaluation Corporation. 2006. SPEC CPU 2006. Retrieved February 21, 2016, from http://www.spec.org/cpu2006/.Google ScholarGoogle Scholar
  39. R. Thomas and M. Franklin. 2001. Using dataflow based context for accurate value prediction. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. F. Tseng and Y. N. Patt. 2008. Achieving out-of-order performance with almost in-order complexity. In Proceedings of the International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. E. S. Tune, D. M. Tullsen, and B. Calder. 2002. Quantifying instruction criticality. In Proceedings of the Conference on Parallel Architectures and Compilation Techniques. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. S. Wallace and N. Bagherzadeh. 1996. A scalable register file architecture for dynamically scheduled processors. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. K. Wang and M. Franklin. 1997. Highly accurate data value prediction using hybrid predictors. In Proceedings of the International Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. H. Zhou, J. Flanagan, and T. M. Conte. 2003. Detecting global stride locality in value streams. In Proceedings of the International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. V. Zyuban and P. Kogge. 1998. The energy complexity of register files. In Proceedings of the International Symposium on Low Power Electronics and Design. 305--310. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. EOLE: Combining Static and Dynamic Scheduling Through Value Prediction to Reduce Complexity and Increase Performance

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Computer Systems
          ACM Transactions on Computer Systems  Volume 34, Issue 2
          May 2016
          96 pages
          ISSN:0734-2071
          EISSN:1557-7333
          DOI:10.1145/2912575
          Issue’s Table of Contents

          Copyright © 2016 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 21 April 2016
          • Accepted: 1 December 2015
          • Received: 1 November 2015
          Published in tocs Volume 34, Issue 2

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!