skip to main content
research-article

Benchmarking weak memory models

Published:27 February 2016Publication History
Skip Abstract Section

Abstract

To achieve good multi-core performance, modern microprocessors have weak memory models, rather than enforce sequential consistency. This gives the programmer a wide scope for choosing exactly how to implement various aspects of inter-thread communication through the system's shared memory. However, these choices come with both semantic and performance consequences, often in tension with each other. In this paper, we focus on the performance side, and define techniques for evaluating the impact of various choices in using weak memory models, such as where to put fences, and which fences to use. We make no attempt to judge certain strategies as best or most efficient, and instead provide the techniques that will allow the programmer to understand the performance implications when identifying and resolving any semantic/performance trade-offs. In particular, our technique supports the reasoned selection of macrobenchmarks to use in investigating trade-offs in using weak memory models. We demonstrate our technique on both synthetic benchmarks and real-world applications for the Linux Kernel and OpenJDK Hotspot Virtual Machine on the ARMv8 and POWERv7 architectures.

References

  1. J. Alglave, L. Maranget, and M. Tautschnig. Herding cats: Modelling, simulation, testing, and data mining for weak memory. ACM Trans. Program. Lang. Syst., 36(2):7:1--7:74, 2014. doi:10.1145/2627752. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Apache Software Foundation. Apache Spark: lightning-fast cluster computing, 2014. URL http://spark.apache.org.Google ScholarGoogle Scholar
  3. ARM Limited. ARM Architecture Reference Manual. ARMv8, for ARMv8-A architecture profile. ARM Limited, 2015.Google ScholarGoogle Scholar
  4. M. Batty, S. Owens, S. Sarkar, P. Sewell, and T. Weber. Mathematizing C++ concurrency. In POPL '11: Proceedings of the 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 55--66, 2011. doi:10.1145/1926385.1926394. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Batty, K. Memarian, K. Nienhuis, J. Pichon-Pharabod, and P. Sewell. The problem of programming language concurrency semantics. In Programming Languages and Systems - 24th European Symposium on Programming, ESOP 2015. Proceedings, pages 283--307, 2015. doi:10.1007/978-3-662-46669-8 12.Google ScholarGoogle Scholar
  6. S. M. Blackburn, R. Garner, C. Hoffmann, A. M. Khan, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. L. Hosking, M. Jump, H. B. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanovic, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The DaCapo benchmarks: Java benchmarking development and analysis. In OOPSLA '06: Proceedings of the 21th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, pages 169--190, 2006. doi:10.1145/1167473.1167488. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. H. Boehm and S. V. Adve. Foundations of the C++ concurrency memory model. In PLDI '08: Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 68--78, 2008. doi:10.1145/1375581.1375591. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. H. Boehm and B. Demsky. Outlawing ghosts: avoiding out-of-thin-air results. In MSPC '14: Proceedings of the workshop on Memory Systems Performance and Correctness, pages 7:1--7:6, 2014. doi:10.1145/2618128.2618134. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R. R. Branco and V. Henson. ebizzy benchmark, 2008. URL http://ebizzy.sourceforge.net.Google ScholarGoogle Scholar
  10. C. Curtsinger and E. D. Berger. Coz: finding code that counts with causal profiling. In SOSP '15: Proceedings of the 25th Symposium on Operating Systems Principles, pages 184--197, 2015. doi:10.1145/2815400.2815409. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Demange, V. Laporte, L. Zhao, S. Jagannathan, D. Pichardie, and J. Vitek. Plan B: a buffered memory model for Java. In POPL '13: Proceedings of the 40th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 329--342, 2013. doi:10.1145/2429069.2429110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Flur, K. E. Gray, C. Pulte, S. Sarkar, A. Sezgin, L. Maranget, W. Deacon, and P. Sewell. Modelling the ARMv8 architecture, operationally: Concurrency and ISA. In POPL '16: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. L. Gidra, G. Thomas, J. Sopena, M. Shapiro, and N. Nguyen. NumaGiC: a garbage collector for big data on big NUMA machines. In ASPLOS '15: Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 661--673, 2015. doi:10.1145/2694344.2694361. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. K. E. Gray, G. Kerneis, D. Mulligan, C. Pulte, S. Sarkar, and P. Sewell. An integrated concurrency and core-ISA architectural envelope definition, and test oracle, for IBM POWER multiprocessors. In MICRO-48: Proceedings of the 48th Annual IEEE/ACM International Symposium on Microarchitecture, 2015. doi:10.1145/2830772.2830775. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Haley. OpenJDK RFR: 8135187: DMB elimination in AArch64 C2 synchronization implementation, 2015. URL http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-September/018899.html.Google ScholarGoogle Scholar
  16. D. Howells and P. E. McKenney. Linux kernel memory barriers, 2015. URL https://www.kernel.org/doc/Documentation/memory-barriers.txt.Google ScholarGoogle Scholar
  17. E. Jones, T. Oliphant, P. Peterson, et al. SciPy: Open source scientific tools for Python, 2001--. URL http://www.scipy.org/.Google ScholarGoogle Scholar
  18. R. Jones. Netperf benchmark, 2015. URL http://www.netperf.org.Google ScholarGoogle Scholar
  19. T. Kalibera, M. Mole, R. E. Jones, and J. Vitek. A black-box approach to understanding concurrency in DaCapo. In OOPSLA '12: Proceedings of the 27th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, pages 335--354, 2012. doi:10.1145/2384616.2384641. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Leskovec and A. Krevl. SNAP Datasets: Stanford large network dataset collection, 2014. URL http://snap.stanford.edu/data.Google ScholarGoogle Scholar
  21. J. Manson, W. Pugh, and S. V. Adve. The Java memory model. In POPL '05: Proceedings of the 32nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 378--391, 2005. doi:10.1145/1040305.1040336. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. D. Marino, A. Singh, T. D. Millstein, M. Musuvathi, and S. Narayanasamy. A case for an SC-preserving compiler. In PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 199--210, 2011. doi:10.1145/1993498.1993522. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. P. E. McKenney. Linux-kernel memory model. Technical report, ISO IEC JTC1/SC22/WG21, 2015. URL http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4444.html.Google ScholarGoogle Scholar
  24. L. McVoy and C. Staelin. LMbench - tools for performance analysis, 2012. URL http://www.bitmover.com/lmbench/.Google ScholarGoogle Scholar
  25. S. Sarkar, P. Sewell, J. Alglave, L. Maranget, and D. Williams. Understanding POWER multiprocessors. In PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 175--186, 2011. doi:10.1145/1993498.1993520. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. A. Singh, S. Narayanasamy, D. Marino, T. D. Millstein, and M. Musuvathi. End-to-end sequential consistency. In 39th International Symposium on Computer Architecture, ISCA 2012, pages 524--535, 2012. doi:10.1109/ISCA.2012.6237045. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. V. Vafeiadis, T. Balabonski, S. Chakraborty, R. Morisset, and F. Zappa Nardelli. Common compiler optimisations are invalid in the C11 memory model and what we can do about it. In POPL '15: Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 209--220, 2015. doi:10.1145/2676726.2676995. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 51, Issue 8
    PPoPP '16
    August 2016
    405 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/3016078
    Issue’s Table of Contents
    • cover image ACM Conferences
      PPoPP '16: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
      February 2016
      420 pages
      ISBN:9781450340922
      DOI:10.1145/2851141

    Copyright © 2016 ACM

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 27 February 2016

    Check for updates

    Qualifiers

    • research-article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader
About Cookies On This Site

We use cookies to ensure that we give you the best experience on our website.

Learn more

Got it!