skip to main content
article
Free Access

Exploiting reference idempotency to reduce speculative storage overflow

Published:01 September 2006Publication History
Skip Abstract Section

Abstract

Recent proposals for multithreaded architectures employ speculative execution to allow threads with unknown dependences to execute speculatively in parallel. The architectures use hardware speculative storage to buffer speculative data, track data dependences and correct incorrect executions through roll-backs. Because all memory references access the speculative storage, current proposals implement speculative storage using small memory structures to achieve fast access. The limited capacity of the speculative storage causes considerable performance loss due to speculative storage overflow whenever a thread's speculative state exceeds the speculative storage capacity. Larger threads exacerbate the overflow problem but are preferable to smaller threads, as larger threads uncover more parallelism.In this article, we discover a new program property called memory reference idempotency. Idempotent references are guaranteed to be eventually corrected, though the references may be temporarily incorrect in the process of speculation. Therefore, idempotent references, even from nonparallelizable program sections, need not be tracked in the speculative storage, and instead can directly access nonspeculative storage (i.e., conventional memory hierarchy). Thus, we reduce the demand for speculative storage space in large threads. We define a formal framework for reference idempotency and present a novel compiler-assisted speculative execution model. We prove the necessary and sufficient conditions for reference idempotency using our model. We present a compiler algorithm to label idempotent memory references for the hardware. Experimental results show that for our benchmarks, over 60% of the references in nonparallelizable program sections are idempotent.

References

  1. Banerjee, U. 1988. Dependence Analysis for Supercomputing. Kluwer, Boston, MA. Google ScholarGoogle Scholar
  2. Blume, W., Doallo, R., Eigenmann, R., Grout, J., Hoeflinger, J., Lawrence, T., Lee, J., Padua, D., Paek, Y., Pottenger, B., Rauchwerger, L., and Tu, P. 1996. Parallel programming with Polaris. IEEE Comput. 78--82. Google ScholarGoogle Scholar
  3. Gopal, S., Vijaykumar, T., Smith, J. E., and Sohi, G. S. 1998. Speculative versioning cache. In Proceedings of the 4th IEEE Symposium on High-Performance Computer Architecture (HPCA-4). 195--205. Google ScholarGoogle Scholar
  4. Gupta, M. 1998. Techniques for speculative runtime parallelization of loops. In Proceedings of the International Conference on Supercomputing (ICS). Google ScholarGoogle Scholar
  5. Hall, M. W., Anderson, J. M., Amarasinghe, S. P., Murphy, B. R., Liao, S.-W., Bugnion, E., and Lam, M. S. 1996. Maximizing multiprocessor performance with the SUIF compiler. IEEE Comput. 84--89. Google ScholarGoogle Scholar
  6. Hammond, L., Willey, M., and Olukotun, K. 1998. Data speculation support for a chip multiprocessors. In Proceedings of the 8th ACM Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). Google ScholarGoogle Scholar
  7. Ooi, C.-L., Kim, S. W., Park, I., Eigenmann, R., Falsafi, B., and Vijaykumar, T. N. 2001. Multiplex: Unifying conventional and speculative thread-level parallelism on a chip multiprocessor. In Proceedings of the International Conference on Supercomputing (ICS). ACM Press, 368--380. Google ScholarGoogle Scholar
  8. Rauchwerger, L. and Padua, D. 1995. The LRPD test: Speculative runtime parallelization of loops with privatization and reduction parallelization. In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation. Google ScholarGoogle Scholar
  9. Sohi, G. S., Breach, S. E., and Vijaykumar, T. N. 1995. Multiscalar processors. In Proceedings of the 22nd International Symposium on Computer Architecture (ISCA-22). 414--425. Google ScholarGoogle Scholar
  10. Steffan, J. G., Colohan, C. B., Zhai, A., and Mowry, T. C. 2000. A scalable approach to thread-level speculation. In Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA-27). Google ScholarGoogle Scholar
  11. Sun Microsystems. 1999. MAJC architecture tutorial. White Paper.Google ScholarGoogle Scholar
  12. Tu, P. and Padua, D. 1993. Automatic array privatization. In Proceedings of 6th Workshop on Languages and Compilers for Parallel Computing (Portland, OR), Lecture Notes in Computer Science, U. Banerjee et al., eds. vol. 768, 500--521. Google ScholarGoogle Scholar
  13. Vijaykumar, T. N. and Sohi, G. S. 1998. Task selection for a multiscalar processor. In Proceedings of the 31st International Symposium on Microarchitecture (MICRO-31). Google ScholarGoogle Scholar
  14. Zhang, Y., Rauchwerger, L., and Torrellas, J. 1999. Hardware for speculative parallelization of partially-parallel loops in DSM multiprocessors. In Proceedings of the 5th International Symposium on High-Performance Computer Architecture (HPCA-5). Google ScholarGoogle Scholar

Index Terms

  1. Exploiting reference idempotency to reduce speculative storage overflow

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!