Abstract
Recent work in the field of value prediction (VP) has shown that given an efficient confidence estimation mechanism, prediction validation could be removed from the out-of-order engine and delayed until commit time. As a result, a simple recovery mechanism—pipeline squashing—can be used, whereas the out-of-order engine remains mostly unmodified.
Yet, VP and validation at commit time require additional ports on the physical register file, potentially rendering the overall number of ports unbearable. Fortunately, VP also implies that many single-cycle ALU instructions have their operands predicted in the front-end and can be executed in-place, in-order. Similarly, the execution of single-cycle instructions whose result has been predicted can be delayed until commit time since predictions are validated at commit time.
Consequently, a significant number of instructions—10% to 70% in our experiments—can bypass the out-of-order engine, allowing for a reduction of the issue width. This reduction paves the way for a truly practical implementation of VP. Furthermore, since VP in itself usually increases performance, our resulting {Early—Out-of-Order—Late} Execution architecture, EOLE, is often more efficient than a baseline VP-augmented 6-issue superscalar while having a significantly narrower 4-issue out-of-order engine.
- P. S. Ahuja, D. W. Clark, and A. Rogers. 1995. The performance impact of incomplete bypassing in processor pipelines. In Proceedings of the International Symposium on Microarchitecture. Google Scholar
Digital Library
- T. M. Austin. 1999. DIVA: A reliable substrate for deep submicron microarchitecture design. In Proceedings of the International Symposium on Microarchitecture. Google Scholar
Digital Library
- N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood. 2011. The gem5 simulator. SIGARCH Computer Architecture News 39, 2, 1--7. Google Scholar
Digital Library
- G. Z. Chrysos and J. S. Emer. 1998. Memory dependence prediction using store sets. In Proceedings of the International Symposium on Computer Architecture. Google Scholar
Digital Library
- R. J. Eickemeyer and S. Vassiliadis. 1993. A load-instruction unit for pipelined processors. IBM Journal of Research and Development 37, 4, 547--564. Google Scholar
Digital Library
- D. Ernst and T. Austin. 2002. Efficient dynamic scheduling through tag elimination. In Proceedings of the International Symposium on Computer Architecture. Google Scholar
Digital Library
- B. Fahs, T. Rafacz, S. J. Patel, and S. S. Lumetta. 2005. Continuous optimization. In Proceedings of the International Symposium on Computer Architecture. Google Scholar
Digital Library
- K. I. Farkas, P. Chow, N. P. Jouppi, and Z. Vranesic. 1997. The multicluster architecture: Reducing cycle time through partitioning. In Proceedings of the International Symposium on Microarchitecture. 11. Google Scholar
Digital Library
- B. Fields, S. Rubin, and R. Bodík. 2001. Focusing processor policies via critical-path prediction. In Proceedings of the International Symposium on Computer Architecture. Google Scholar
Digital Library
- F. Gabbay and A. Mendelson. 1998. Using value prediction to increase the power of speculative execution hardware. ACM Transactions on Computer Systems 16, 3, 234--270. Google Scholar
Digital Library
- S. Gochman, R. Ronen, I. Anati, A. Berkovits, T. Kurts, A. Naveh, A. Saeed, Z. Sperber, and R. C. Valentine. 2003. The Intel Pentium M processor: Microarchitecture and performance. Intel Technology Journal 7, 2, 21--36.Google Scholar
- B. Goeman, H. Vandierendonck, and K. De Bosschere. 2001. Differential FCM: Increasing value prediction accuracy by improving table usage efficiency. In Proceedings of the International Conference on High-Performance Computer Architecture. Google Scholar
Digital Library
- Intel. 2007. Intel 64 Architecture Memory Ordering White Paper. Retrieved February 21, 2016, from http://www.cs.cmu.edu/∼410/doc/Intel_Reordering_318147.pdf.Google Scholar
- Intel. 2013. Intel 64 and IA-32 Architectures Software Developer’s Manual. Available at http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf.Google Scholar
- Intel. 2014. Software Optimization Manual. Retrieved February 21, 2016, from http://www..fr/content/www/fr/fr/architecture-and-technology/64-ia-32-arc hitectures-optimization-manual.html.Google Scholar
- S. Jourdan, R. Ronen, M. Bekerman, B. Shomar, and A. Yoaz. 1998. A novel renaming scheme to exploit value temporal locality through physical register reuse and unification. In Proceedings of the International Symposium on Microarchitecture. Google Scholar
Digital Library
- R. E. Kessler, E. J. Mclellan, and D. A. Webb. 1998. The Alpha 21264 microprocessor architecture. In Proceedings of the International Conference on Computer Design. Google Scholar
Digital Library
- I. Kim and M. H. Lipasti. 2003. Half-price architecture. In Proceedings of the International Symposium on Computer Architecture. Google Scholar
Digital Library
- I. Kim and M. H. Lipasti. 2004. Understanding scheduling replay schemes. In Proceedings of the International Symposium on High-Performance Computer Architecture. Google Scholar
Digital Library
- M. H. Lipasti and J. P. Shen. 1996. Exceeding the dataflow limit via value prediction. In Proceedings of the Annual International Symposium on Microarchitecture. Google Scholar
Digital Library
- M. H. Lipasti, C. B. Wilkerson, and J. P. Shen. 1996. Value locality and load value prediction. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems. Google Scholar
Digital Library
- A. Lukefahr, S. Padmanabha, R. Das, F. M. Sleiman, R. Dreslinski, T. F. Wenisch, and S. Mahlke. 2012. Composite cores: Pushing heterogeneity into a core. In Proceedings of the International Symposium on Microarchitecture. Google Scholar
Digital Library
- A. Mendelson and F. Gabbay. 1997. Speculative Execution Based on Value Prediction. Technical Report TR1080. Technion-Israel Institute of Technology.Google Scholar
- T. Nakra, R. Gupta, and M. L. Soffa. 1999. Global context-based value prediction. In Proceedings of the International Symposium on High-Performance Computer Architecture. 4--12. Google Scholar
Digital Library
- S. Palacharla, N. P. Jouppi, and J. E. Smith. 1997. Complexity-effective superscalar processors. In Proceedings of the International Symposium on Computer Architecture. Google Scholar
Digital Library
- A. Perais and A. Seznec. 2014a. EOLE: Paving the way for an effective implementation of value prediction. In Proceedings of the International Symposium on Computer Architecture. Google Scholar
Digital Library
- A. Perais and A. Seznec. 2014b. Practical data value speculation for future high-end processors. In Proceedings of the International Symposium on High-Performance Computer Architecture.Google Scholar
- A. Perais and A. Seznec. 2015. BeBoP: A cost effective predictor infrastructure for superscalar value prediction. In Proceedings of the International Symposium on High-Performance Computer Architecture.Google Scholar
- A. Perais, A. Seznec, P. Michaud, A. Sembrant, and E. Hagersten. 2015. Cost-effective speculative scheduling in high performance processors. In Proceedings of the International Symposium on Computer Architecture. Google Scholar
Digital Library
- E. Perelman, G. Hamerly, and B. Calder. 2003. Picking statistically valid and early simulation points. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques. Google Scholar
Digital Library
- V. Petric, T. Sha, and A. Roth. 2005. RENO: A rename-based instruction optimizer. In Proceedings of the International Symposium on Computer Architecture. Google Scholar
Digital Library
- B. Rychlik, J. W. Faistl, B. P. Krug, A. Y. Kurland, J. J. Sung, M. N. Velev, and J. P. Shen. 1998. Efficient and Accurate Value Prediction Using Dynamic Classification. CMμART-1998-01. Carnegie Mellon University, Pittsburgh, PA.Google Scholar
- Y. Sazeides and J. E. Smith. 1997. The predictability of data values. In Proceedings of the International Symposium on Microarchitecture. Google Scholar
Digital Library
- A. Seznec. 2011. Storage free confidence estimation for the TAGE branch predictor. In Proceedings of the International Symposium on High-Performance Computer Architecture. Google Scholar
Digital Library
- A. Seznec and P. Michaud. 2006. A case for (partially) TAgged GEometric history length branch prediction. Journal of Instruction Level Parallelism 8, 1--23.Google Scholar
- A. Seznec, E. Toullec, and O. Rochecouste. 2002. Register write specialization register read specialization: A path to complexity-effective wide-issue superscalar processors. In Proceedings of the International Symposium on Microarchitecture. Google Scholar
Digital Library
- Standard Performance Evaluation Corporation. 2000. SPEC CPU 2000 V1.3. Retrieved February 21, 2016, from http://www.spec.org/cpu2000/.Google Scholar
- Standard Performance Evaluation Corporation. 2006. SPEC CPU 2006. Retrieved February 21, 2016, from http://www.spec.org/cpu2006/.Google Scholar
- R. Thomas and M. Franklin. 2001. Using dataflow based context for accurate value prediction. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques. Google Scholar
Digital Library
- F. Tseng and Y. N. Patt. 2008. Achieving out-of-order performance with almost in-order complexity. In Proceedings of the International Symposium on Computer Architecture. Google Scholar
Digital Library
- E. S. Tune, D. M. Tullsen, and B. Calder. 2002. Quantifying instruction criticality. In Proceedings of the Conference on Parallel Architectures and Compilation Techniques. Google Scholar
Digital Library
- S. Wallace and N. Bagherzadeh. 1996. A scalable register file architecture for dynamically scheduled processors. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques. Google Scholar
Digital Library
- K. Wang and M. Franklin. 1997. Highly accurate data value prediction using hybrid predictors. In Proceedings of the International Symposium on Microarchitecture. Google Scholar
Digital Library
- H. Zhou, J. Flanagan, and T. M. Conte. 2003. Detecting global stride locality in value streams. In Proceedings of the International Symposium on Computer Architecture. Google Scholar
Digital Library
- V. Zyuban and P. Kogge. 1998. The energy complexity of register files. In Proceedings of the International Symposium on Low Power Electronics and Design. 305--310. Google Scholar
Digital Library
Index Terms
EOLE: Combining Static and Dynamic Scheduling Through Value Prediction to Reduce Complexity and Increase Performance
Recommendations
Developing the AMD-K5 Architecture
The K5TM microprocessor project at AMD created the basis for Advanced Micro Devices' first independently designed implementations of the x86 architecture. The first-time nature of this project, coupled with the complexity of the x86 architecture, ...
The effect of speculatively updating branch history on branch prediction accuracy, revisited
MICRO 27: Proceedings of the 27th annual international symposium on MicroarchitectureRecent research has suggested that the branch history register need not contain the outcomes of the most recent branches in order for the Two-Level Adaptive Branch Predictor to work well. From this result, it is tempting to conclude that the branch ...
First Step to Combining Control and Data Speculation
IWIA '98: Proceedings of the 1998 International Workshop on Innovative ArchitectureRecently there are many studies of data value prediction for increasing instruction level parallelism, and it is found that data speculation affects branch prediction accuracy. Even when data dependences are speculated successfully, processor ...






Comments