Abstract
Recent proposals for multithreaded architectures employ speculative execution to allow threads with unknown dependences to execute speculatively in parallel. The architectures use hardware speculative storage to buffer speculative data, track data dependences and correct incorrect executions through roll-backs. Because all memory references access the speculative storage, current proposals implement speculative storage using small memory structures to achieve fast access. The limited capacity of the speculative storage causes considerable performance loss due to speculative storage overflow whenever a thread's speculative state exceeds the speculative storage capacity. Larger threads exacerbate the overflow problem but are preferable to smaller threads, as larger threads uncover more parallelism.In this article, we discover a new program property called memory reference idempotency. Idempotent references are guaranteed to be eventually corrected, though the references may be temporarily incorrect in the process of speculation. Therefore, idempotent references, even from nonparallelizable program sections, need not be tracked in the speculative storage, and instead can directly access nonspeculative storage (i.e., conventional memory hierarchy). Thus, we reduce the demand for speculative storage space in large threads. We define a formal framework for reference idempotency and present a novel compiler-assisted speculative execution model. We prove the necessary and sufficient conditions for reference idempotency using our model. We present a compiler algorithm to label idempotent memory references for the hardware. Experimental results show that for our benchmarks, over 60% of the references in nonparallelizable program sections are idempotent.
- Banerjee, U. 1988. Dependence Analysis for Supercomputing. Kluwer, Boston, MA. Google Scholar
- Blume, W., Doallo, R., Eigenmann, R., Grout, J., Hoeflinger, J., Lawrence, T., Lee, J., Padua, D., Paek, Y., Pottenger, B., Rauchwerger, L., and Tu, P. 1996. Parallel programming with Polaris. IEEE Comput. 78--82. Google Scholar
- Gopal, S., Vijaykumar, T., Smith, J. E., and Sohi, G. S. 1998. Speculative versioning cache. In Proceedings of the 4th IEEE Symposium on High-Performance Computer Architecture (HPCA-4). 195--205. Google Scholar
- Gupta, M. 1998. Techniques for speculative runtime parallelization of loops. In Proceedings of the International Conference on Supercomputing (ICS). Google Scholar
- Hall, M. W., Anderson, J. M., Amarasinghe, S. P., Murphy, B. R., Liao, S.-W., Bugnion, E., and Lam, M. S. 1996. Maximizing multiprocessor performance with the SUIF compiler. IEEE Comput. 84--89. Google Scholar
- Hammond, L., Willey, M., and Olukotun, K. 1998. Data speculation support for a chip multiprocessors. In Proceedings of the 8th ACM Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). Google Scholar
- Ooi, C.-L., Kim, S. W., Park, I., Eigenmann, R., Falsafi, B., and Vijaykumar, T. N. 2001. Multiplex: Unifying conventional and speculative thread-level parallelism on a chip multiprocessor. In Proceedings of the International Conference on Supercomputing (ICS). ACM Press, 368--380. Google Scholar
- Rauchwerger, L. and Padua, D. 1995. The LRPD test: Speculative runtime parallelization of loops with privatization and reduction parallelization. In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation. Google Scholar
- Sohi, G. S., Breach, S. E., and Vijaykumar, T. N. 1995. Multiscalar processors. In Proceedings of the 22nd International Symposium on Computer Architecture (ISCA-22). 414--425. Google Scholar
- Steffan, J. G., Colohan, C. B., Zhai, A., and Mowry, T. C. 2000. A scalable approach to thread-level speculation. In Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA-27). Google Scholar
- Sun Microsystems. 1999. MAJC architecture tutorial. White Paper.Google Scholar
- Tu, P. and Padua, D. 1993. Automatic array privatization. In Proceedings of 6th Workshop on Languages and Compilers for Parallel Computing (Portland, OR), Lecture Notes in Computer Science, U. Banerjee et al., eds. vol. 768, 500--521. Google Scholar
- Vijaykumar, T. N. and Sohi, G. S. 1998. Task selection for a multiscalar processor. In Proceedings of the 31st International Symposium on Microarchitecture (MICRO-31). Google Scholar
- Zhang, Y., Rauchwerger, L., and Torrellas, J. 1999. Hardware for speculative parallelization of partially-parallel loops in DSM multiprocessors. In Proceedings of the 5th International Symposium on High-Performance Computer Architecture (HPCA-5). Google Scholar
Index Terms
Exploiting reference idempotency to reduce speculative storage overflow
Recommendations
An evaluation of speculative instruction execution on simultaneous multithreaded processors
Modern superscalar processors rely heavily on speculative execution for performance. For example, our measurements show that on a 6-issue superscalar, 93% of committed instructions for SPECINT95 are speculative. Without speculation, processor resources ...
Reference idempotency analysis: a framework for optimizing speculative execution
PPoPP '01: Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programmingRecent proposals for multithreaded architectures allow threads with unknown dependences to execute speculatively in parallel. These architectures use hardware speculative storage to buffer uncertain data, track data dependences and roll back incorrect ...
Reference idempotency analysis: a framework for optimizing speculative execution
Recent proposals for multithreaded architectures allow threads with unknown dependences to execute speculatively in parallel. These architectures use hardware speculative storage to buffer uncertain data, track data dependences and roll back incorrect ...






Comments