Abstract
Load value speculation has long been proposed as a method to hide the latency of memory accesses. It has seen very limited use in actual processors, often due to the high overhead of reexecuting misspeculated computations. We present PreCoRe, a framework capable of generating application-specific microarchitectures supporting load value speculation on reconfigurable computers. The article examines the lightweight speculation and replay mechanisms, the architecture of the actual data value prediction units as well as the impact on the nonspeculative parts of the memory system. In experiments, using PreCoRe has achieved speedups of up to 2.48 times over nonspeculative implementations.
- Budiu, M. and Goldstein, S. C. 2005. Inter-iteration scalar replacement in the presence of conditional control-flow. In Proceedings of the 3rd Workshop on Optimizations for DSP and Embedded Systems.Google Scholar
- Callahan, D., Carr, S., and Kennedy, K. 1990. Improving register allocation for subscripted variables. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’90). ACM Press, New York, NY, 53--65. Google Scholar
Digital Library
- Callahan, T. J. 2002. Automatic compilation of C for hybrid reconfigurable architectures. Ph.D. thesis, University of California, Berkeley. Google Scholar
Digital Library
- Gädke-Lütjens, H. 2011. Dynamic scheduling in high-level compilation for adaptive computers. Ph.D. thesis, Technical University Braunschweig.Google Scholar
- Gädke-Lütjens, H., Thielmann, B., and Koch, A. 2010. A flexible compute and memory infrastructure for high-level language to hardware compilation. In Proceedings of the International Conference on Field Programmable Logic and Applications. 475--482. Google Scholar
Digital Library
- González, J. and González, A. 1998. The potential of data value speculation to boost ILP. In Proceedings of the International Conference on Supercomputing (ICS’98). ACM, New York, NY, 21--28. Google Scholar
Digital Library
- González, J. and González, A. 1999. Limits of instruction level parallelism with data value speculation. In Proceedings of the International Conference on Vector and Parallel Processing (VECPAR’98). 452--465. Google Scholar
Digital Library
- Hennessy, J. L. and Patterson, D. A. 2003. Computer Architecture: A Quantitative Approach 3rd Ed. Morgan Kaufmann Publishers Inc., San Francisco, CA. Google Scholar
Digital Library
- Jouppi, N. P. 1990. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In Proceedings of the 17th Annual International Symposium on Computer Architecture. Google Scholar
Digital Library
- Kaeli, D. and Yew, P.-C. 2005. Speculative Execution in High Performance Computer Architectures. CRC Press, Inc. Google Scholar
Digital Library
- Kasprzyk, N. and Koch, A. 2005. High-level-language compilation for reconfigurable computers. In Proceedings of the International Workshop on Reconfigurable Communication-Centric Systems-on-Chip.Google Scholar
- Kumar, S., Pires, L., Ponnuswamy, S., Nanavati, C., Golusky, J., Vojta, M., Wadi, S., Pandalai, D., and Spaanenberg, H. 2000. A benchmark suite for evaluating configurable computing systems---Status, reflections, and future directions. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 126--134. Google Scholar
Digital Library
- Lange, H. and Koch, A. 2010. Architectures and execution models for hardware/software compilation and their system-level realization. IEEE Trans. Comput. Google Scholar
Digital Library
- Lange, H., Wink, T., and Koch, A. 2011. MARC II: A parametrized speculative multi-ported memory subsystem for reconfigurable computers. In Proceedings of the Conference and Exhibition on Design, Automation and Test in Europe.Google Scholar
- Lee, C., Potkonjak, M., and Mangione-Smith, W. H. 1997. MediaBench: A tool for evaluating and synthesizing multimedia and communicatons systems. In Proceedings of the 30th Annual ACM/IEEE International Symposium on Microarchitecture. IEEE, 330--335. Google Scholar
Digital Library
- Li, Y., Callahan, T., Darnell, E., Harr, R., Kurkure, U., and Stockwood, J. 2000. Hardware-software co-design of embedded reconfigurable architectures. In Proceedings of the IEEE/ACM Design Automation Conference. 507--512. Google Scholar
Digital Library
- Lipasti, M. H., Wilkerson, C. B., and Shen, J. P. 1996. Value locality and load value prediction. In Proceedings of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, New York, NY, 138--147. Google Scholar
Digital Library
- McNairy, C. and Soltis, D. 2003. Itanium 2 processor microarchitecture. IEEE Micro. 23, 44--55. Google Scholar
Digital Library
- Mock, M., Villamarin, R., and Baiocchi, J. 2005. An empirical study of data speculation use on the Intel Itanium 2 processor. In Proceedings of the Workshop on Interaction between Compilers and Computer Architectures. IEEE, 22--33. Google Scholar
Digital Library
- Sam, N. B. and Burtscher, M. 2005. On the energy-efficiency of speculative hardware. In Proceedings of the 2nd Conference on Computing Frontiers (CF’05). ACM, New York, NY, 361--370. Google Scholar
Digital Library
- Thielmann, B., Huthmann, J., and Koch, A. 2011a. Evaluation of speculative execution techniques for high-level language to hardware compilation. In Proceedings of the IEEE International Workshop on Reconfigurable Communication-Centric Systems-on-Chip.Google Scholar
- Thielmann, B., Huthmann, J., and Koch, A. 2011b. PreCoRe: A token-based speculation architecture for high-level language to hardware compilation. In Proceedings of the International Conference Field-Programmable Logic and Applications. Google Scholar
Digital Library
- Wang, K. and Franklin, M. 1997. Highly accurate data value prediction using hybrid predictors. In Proceedings of the 30th Annual ACM/IEEE International Symposium on Microarchitecture. IEEE, 281--290. Google Scholar
Digital Library
Index Terms
Memory Latency Hiding by Load Value Speculation for Reconfigurable Computers
Recommendations
Exploiting selective instruction reuse and value prediction in a superscalar architecture
In our previously published research we discovered some very difficult to predict branches, called unbiased branches. Since the overall performance of modern processors is seriously affected by misprediction recovery, especially these difficult branches ...
Cost Effective Memory Dependence Prediction using Speculation Levels and Color Sets
PACT '02: Proceedings of the 2002 International Conference on Parallel Architectures and Compilation TechniquesMemory dependence prediction allows out-of-order issue processors to achieve high degrees of instruction level parallelism by issuing load instructions at the earliest time without causing a significant number of memory order violations. We present a ...
First Step to Combining Control and Data Speculation
IWIA '98: Proceedings of the 1998 International Workshop on Innovative ArchitectureRecently there are many studies of data value prediction for increasing instruction level parallelism, and it is found that data speculation affects branch prediction accuracy. Even when data dependences are speculated successfully, processor ...






Comments