skip to main content
research-article

Asymmetric Cache Coherency: Policy Modifications to Improve Multicore Performance

Published:01 October 2012Publication History
Skip Abstract Section

Abstract

Asymmetric coherency is a new optimization method for coherency policies to support nonuniform workloads in multicore processors. Asymmetric coherency assists in load balancing a workload and this is applicable to SoC multicores where the applications are not evenly spread among the processors and customization of the coherency is possible. Asymmetric coherency is a policy change, and consequently our designs require little or no additional hardware over an existing system. We explore two different types of asymmetric coherency policies. Our bus-based asymmetric coherency policy, generated a 60% coherency cost reduction (reduction of latencies due to coherency messages) for nonshared data. Our directory-based asymmetric coherency policy, showed up to a 5.8% execution time improvement and up to a 22% improvement in average memory latency for the parallel benchmarks Sha, using a statically allocated asymmetry. Dynamically allocated asymmetry was found to generate further improvements in access latency, increasing the effectiveness of asymmetric coherency by up to 73.8% when compared to the static asymmetric solution.

References

  1. Annavaram, M., Grochowski, E., and Shen, J. 2005. Mitigating Amdahl’s Law through EPI throttling. SIGARCH Comput. Archit. News 33, 298--309. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Becchi, M. and Crowley, P. 2006. Dynamic thread assignment on heterogeneous multiprocessor architectures. In Proceedings of the 3rd Conference on Computing Frontiers. ACM, New York, NY, 29--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bennett, J., Carter, J., and Zwaenepoel, W. 1990. Adaptive software cache management for distributed shared memory architectures. In Proceedings of the 17th International Symposium on Computer Architecture. 125--134. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Chandra, S., Larus, J. R., and Rogers, A. 1994. Where is time spent in message-passing and shared-memory programs? In Proceedings of the 6th International Conference on Architectural Support for Programming Languages and Operating Systems. 61--73. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Cheng, L., Carter, J. B., and Dai, D. 2007. An adaptive cache coherence protocol optimized for producer-consumer sharing. In Proceedings of the IEEE 13th International Symposium on High Performance Computer Architecture. 328--339. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Cox, A. L. and Fowler, R. J. 1993. Adaptive cache coherency for detecting migratory shared data. In Proceedings of the 20th International Symposium on Computer Architecture. 98--108. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Guthaus, M. R., Ringenberg, J. S., Ernst, D., Austin, T. M., Mudge, T., and Brown, R. B. 2001. MiBench: A free, commercially rep-resentative embedded benchmark suite. In Proceedings in the IEEE International Workshop on Workload Characterization. 3--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Iqbal, S., Liang, Y., and Grahn, H. 2010. Parmibench - an open-source benchmark for embedded multiprocessor systems. Comput. Archit. Lett. 9, 2, 45--48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Iyer, R., Zhao, L., Guo, F., Illikkal, R., Makineni, S., Newell, D., Solihin, Y., Hsu, L., and Reinhardt, S. 2007. QoS policies and architecture for cache/memory in CMP platforms. In Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems. 25--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Kumar, R., Mattson, T. G., Pokam, G., and van der Wijngaar, R. 2011. A case for message passing for many-core computing. In Multiprocessor System-on-Chip: Hardware Design and Tool Integration. Springer.Google ScholarGoogle Scholar
  11. Martin, M. M. K., Harper, P. J., Sorin, D. J., Hill, M. D., and Wood, D. A. 2003. Using destination-set prediction to improve the latency/bandwidth tradeoff in shared-memory multiprocessors. In Proceedings of the 30th International Symposium on Computer Architecture. 206--217. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Martin, M. M. K., Sorin, D. J., Beckmann, B. M., Marty, M. R., Xu, M., Alameldeen, A. R., Moore, K. E., Hill, M. D., and Wood, D. A. 2005. Multifacet’s general execution-driven multiprocessor simulator (GEMS) toolset. SIGARCH Comput. Archit. News 33, 92--99. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Shield, J., Sutton, P., and Machanick, P. 2007. Analysis of kernel effects on optimisation mismatch in cache reconfiguration. In Proceedings of the 17th International Conference on Field Programmable Logic and Applications. IEEE, 625--628.Google ScholarGoogle Scholar
  14. Shield, J., Diguet, J.-P., and Gogniat, G. 2011. Asymmetric cache coherency: Improving multicore performance for non-uniform workloads. In Proceedings of the 6th International Workshop on Reconfigurable Communication-Centric Systems-on-Chip. 1--8.Google ScholarGoogle Scholar
  15. Stenström, P., Brorsson, M., and Sandberg, L. 1993. An adaptive cache coherence protocol optimized for migratory sharing. In Proceedings of the 20th International Symposium on Computer Architecture. 109--118. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Wolf, W., Jerraya, A., and Martin, G. 2008. Multiprocessor System-on-Chip (MPSoC) technology. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 27, 10, 1701--1713. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Asymmetric Cache Coherency: Policy Modifications to Improve Multicore Performance

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Reconfigurable Technology and Systems
        ACM Transactions on Reconfigurable Technology and Systems  Volume 5, Issue 3
        October 2012
        102 pages
        ISSN:1936-7406
        EISSN:1936-7414
        DOI:10.1145/2362374
        Issue’s Table of Contents

        Copyright © 2012 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 October 2012
        • Accepted: 1 March 2012
        • Revised: 1 February 2012
        • Received: 1 August 2011
        Published in trets Volume 5, Issue 3

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed
      • Article Metrics

        • Downloads (Last 12 months)4
        • Downloads (Last 6 weeks)0

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!