skip to main content
research-article
Open Access

The Scalable Commutativity Rule: Designing Scalable Software for Multicore Processors

Published:20 January 2015Publication History
Skip Abstract Section

Abstract

What opportunities for multicore scalability are latent in software interfaces, such as system call APIs? Can scalability challenges and opportunities be identified even before any implementation exists, simply by considering interface specifications? To answer these questions, we introduce the scalable commutativity rule: whenever interface operations commute, they can be implemented in a way that scales. This rule is useful throughout the development process for scalable multicore software, from the interface design through implementation, testing, and evaluation.

This article formalizes the scalable commutativity rule. This requires defining a novel form of commutativity, SIM commutativity, that lets the rule apply even to complex and highly stateful software interfaces.

We also introduce a suite of software development tools based on the rule. Our Commuter tool accepts high-level interface models, generates tests of interface operations that commute and hence could scale, and uses these tests to systematically evaluate the scalability of implementations. We apply Commuter to a model of 18 POSIX file and virtual memory system operations. Using the resulting 26,238 scalability tests, Commuter highlights Linux kernel problems previously observed to limit application scalability and identifies previously unknown bottlenecks that may be triggered by future workloads or hardware.

Finally, we apply the scalable commutativity rule and Commuter to the design and implementation sv6, a new POSIX-like operating system. sv6’s novel file and virtual memory system designs enable it to scale for 99% of the tests generated by Commuter. These results translate to linear scalability on an 80-core x86 machine for applications built on sv6’s commutative operations.

References

  1. Advanced Micro Devices. 2012. AMD64 Architecture Programmer’s Manual. Vol. 2. Advanced Micro Devices.Google ScholarGoogle Scholar
  2. Jonathan Appavoo, Dilma da Silva, Orran Krieger, Marc Auslander, Michal Ostrowski, Bryan Rosenburg, Amos Waterland, Robert W. Wisniewski, Jimi Xenidis, Michael Stumm, and Livio Soares. 2007. Experience distributing objects in an SMMP OS. ACM Transactions on Computer Systems 25, 3 (Aug. 2007), 1--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Hagit Attiya, Rachid Guerraoui, Danny Hendler, Petr Kuznetsov, Maged M. Michael, and Martin Vechev. 2011. Laws of order: Expensive synchronization in concurrent algorithms cannot be eliminated. In Proceedings of the 38th ACM Symposium on Principles of Programming Languages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Hagit Attiya, Eshcar Hillel, and Alessia Milani. 2009. Inherent limitations on disjoint-access parallel implementations of transactional memory. In Proceedings of the 21st Annual ACM Symposium on Parallelism in Algorithms and Architectures. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Andrew Baumann, Paul Barham, Pierre-Evariste Dagand, Tim Harris, Rebecca Isaacs, Simon Peter, Timothy Roscoe, Adrian Schüpbach, and Akhilesh Singhania. 2009. The multikernel: A new OS architecture for scalable multicore systems. In Proceedings of the 22nd ACM Symposium on Operating Systems Principles (SOSP’09). Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Fabrice Bellard and others. 2011. QEMU. Retrieved August 1, 2014, from http://www.qemu.org/.Google ScholarGoogle Scholar
  7. Daniel J. Bernstein. 2007. Some thoughts on security after ten years of qmail 1.0. In Proceedings of the ACM Workshop on Computer Security Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Philip A. Bernstein and Nathan Goodman. 1981. Concurrency control in distributed database systems. Computer Surveys 13, 2 (June 1981), 185--221. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. David L. Black, Richard F. Rashid, David B. Golub, Charles R. Hill, and Robert V. Baron. 1989. Translation lookaside buffer consistency: A software approach. In Proceedings of the 3rd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’89). 113--122. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Silas Boyd-Wickizer. 2014. Optimizing Communication Bottlenecks in Multiprocessor Operating System Kernels. Ph.D. Dissertation. Massachusetts Institute of Technology.Google ScholarGoogle Scholar
  11. Silas Boyd-Wickizer, Haibo Chen, Rong Chen, Yandong Mao, M. Frans Kaashoek, Robert Morris, Aleksey Pesterev, Lex Stein, Ming Wu, Yuehua Dai, Yang Zhang, and Zheng Zhang. 2008. Corey: An operating system for many cores. In Proceedings of the 8th Symposium on Operating Systems Design and Implementation (OSDI’08). Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Silas Boyd-Wickizer, Austin Clements, Yandong Mao, Aleksey Pesterev, M. Frans Kaashoek, Robert Morris, and Nickolai Zeldovich. 2010. An analysis of Linux scalability to many cores. In Proceedings of the 9th Symposium on Operating Systems Design and Implementation (OSDI). Vancouver, Canada. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Cristian Cadar, Daniel Dunbar, and Dawson Engler. 2008. KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs. In Proceedings of the 8th Symposium on Operating Systems Design and Implementation (OSDI’08). Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Cristian Cadar, Vijay Ganesh, Peter M. Pawlowski, David L. Dill, and Dawson R. Engler. 2006. EXE: Automatically generating inputs of death. In Proceedings of the 13th ACM Conference on Computer and Communications Security. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Bryan Cantrill and Jeff Bonwick. 2008. Real-world concurrency. Communications of the ACM 51, 11 (2008), 34--39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Koen Claessen and John Hughes. 2000. QuickCheck: A lightweight tool for random testing of Haskell programs. In Proceedings of the 5th ACM SIGPLAN International Conference on Functional Programming. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Austin T. Clements. 2014. The Scalable Commutativity Rule: Designing Scalable Software for Multicore Processors. Ph.D. Dissertation. Massachusetts Institute of Technology.Google ScholarGoogle Scholar
  18. Austin T. Clements, M. Frans Kaashoek, and Nickolai Zeldovich. 2012. Concurrent address spaces using RCU balanced trees. In Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’12). Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Austin T. Clements, M. Frans Kaashoek, and Nickolai Zeldovich. 2013a. RadixVM: Scalable address spaces for multithreaded applications (revised 2014-08-05). In Proceedings of the ACM EuroSys Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Austin T. Clements, M. Frans Kaashoek, Nickolai Zeldovich, Robert T. Morris, and Eddie Kohler. 2013b. The scalable commutativity rule: Designing scalable software for multicore processors. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP’13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Jonathan Corbet. 2010. The Search for Fast, Scalable Counters. Retrieved August 1, 2014, from http://lwn.net/Articles/170003/.Google ScholarGoogle Scholar
  22. Jonathan Corbet. 2012. Dcache scalability and RCU-walk. (April 23, 2012). Retrieved August 1, 2014, from http://lwn.net/Articles/419811/.Google ScholarGoogle Scholar
  23. Russ Cox, M. Frans Kaashoek, and Robert T. Morris. 2011. Xv6, a simple Unix-like teaching operating system. (February 2011). Retrieved August 1, 2014, from http://pdos.csail.mit.edu/6.828/xv6/.Google ScholarGoogle Scholar
  24. Leonardo de Moura and Nikolaj Bjørner. 2008. Z3: An efficient SMT solver. In Proceedings of the 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. John DeTreville. 1990. Experience with Concurrent Garbage Collectors for Modula-2+. Technical Report 64. DEC Systems Research Center.Google ScholarGoogle Scholar
  26. Adam Dunkels and others. 2012. Lightweight IP. Retrieved August 1, 2014, from http://savannah.nongnu.org/projects/lwip/.Google ScholarGoogle Scholar
  27. DWARF Debugging Information Format Committee. 2010. DWARF debugging information format, version 4. Retrieved from http://www.dwarfstd.org/doc/DWARF4.pdf.Google ScholarGoogle Scholar
  28. Faith Ellen, Yossi Lev, Victor Luchango, and Mark Moir. 2007. SNZI: Scalable nonzero indicators. In Proceedings of the 26th ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Jason Evans. 2006. A scalable concurrent malloc (3) implementation for FreeBSD. In Proceedings of the BSDCan Conference. Ottawa, Canada.Google ScholarGoogle Scholar
  30. Ben Gamsa, Orran Krieger, Jonathan Appavoo, and Michael Stumm. 1999. Tornado: Maximizing locality and concurrency in a shared memory multiprocessor operating system. In Proceedings of the 3rd Symposium on Operating Systems Design and Implementation (OSDI’99). 87--100. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Sanjay Ghemawat. 2007. TCMalloc: Thread-caching Malloc. Retrieved from http://gperftools.googlecode.com/svn/trunk/doc/tcmalloc.html.Google ScholarGoogle Scholar
  32. Patrice Godefroid, Nils Klarlund, and Koushik Sen. 2005. DART: Directed automated random testing. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. J. R. Goodman and H. H. J. Hum. 2009. MESIF: A Two-Hop Cache Coherency Protocol for Point-to-Point Interconnects. Technical Report. University of Auckland and Intel.Google ScholarGoogle Scholar
  34. Maurice Herlihy and Eric Koskinen. 2008. Transactional boosting: A methodology for highly-concurrent transactional objects. In Proceedings of the 13th ACM Symposium on Principles and Practice of Parallel Programming. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Maurice Herlihy and Nir Shavit. 2008. The Art of Multiprocessor Programming. Morgan Kaufmann. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Maurice P. Herlihy and Jeannette M. Wing. 1990. Linearizability: A correctness condition for concurrent objects. ACM Transactions on Programming Languages Systems 12, 3 (1990), 463--492. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. David Howells. 2010. Extended File Stat Functions, Linux Patch. Retrieved August 1, 2014, from https://lkml.org/lkml/2010/7/14/539.Google ScholarGoogle Scholar
  38. IEEE (The Institute of Electrical and Electronics Engineers) and The Open Group. 2013. The Open Group base specifications issue 7, 2013 edition (POSIX.1-2008/Cor 1-2013). Retrieved from http://pubs.opengroup.org/onlinepubs/9699919799/.Google ScholarGoogle Scholar
  39. Intel. 2012. The ACPI Component Architecture Project. Retrieved August 1, 2014, from http://www.acpica.org/.Google ScholarGoogle Scholar
  40. Intel. 2013. Intel 64 and IA-32 Architectures Software Developer’s Manual. Vol. 3. Intel Corporation.Google ScholarGoogle Scholar
  41. ISO. 2011. ISO/IEC 14882:2011(E): Information technology -- Programming languages -- C++. Geneva, Switzerland.Google ScholarGoogle Scholar
  42. Amos Israeli and Lihu Rappoport. 1994. Disjoint-access-parallel implementations of strong shared memory primitives. In Proceedings of the 13th ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Pieter Koopman, Artem Alimarine, Jan Tretmans, and Rinus Plasmeijer. 2002. Gast: Generic automated software testing. In Proceedings of the 14th International Workshop on the Implementation of Functional Languages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Christoph Lameter. 2005. Effective synchronization on Linux/NUMA systems. In Gelato Conference. Retrieved August 1, 2014, from http://lameter.com/gelato2005.pdf.Google ScholarGoogle Scholar
  45. Ran Liu and Haibo Chen. 2012. SSMalloc: A low-latency, locality-conscious memory allocator with stable performance scalability. In Proceedings of the 3rd Asia-Pacific Workshop on Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Paul E. McKenney. 1999. Differential profiling. Software: Practice and Experience 29, 3 (1999), 219--234. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Paul E. McKenney. 2011. Concurrent Code and Expensive Instructions. Retrieved August 1, 2014, from https://lwn.net/Articles/423994/.Google ScholarGoogle Scholar
  48. Paul E. McKenney, Dipankar Sarma, Andrea Arcangeli, Andi Kleen, Orran Krieger, and Rusty Russell. 2002. Read-copy update. In Proceedings of the Linux Symposium.Google ScholarGoogle Scholar
  49. John M. Mellor-Crummey and Michael L. Scott. 1991. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Transactions on Computer Systems 9, 1 (1991), 21--65. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Mark S. Papamarcos and Janak H. Patel. 1984. A low-overhead coherence solution for multiprocessors with private cache memories. In Proceedings of the 11th Annual International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Prakash Prabhu, Soumyadeep Ghosh, Yun Zhang, Nick P. Johnson, and David I. August. 2011. Commutative set: A language extension for implicit parallel programming. In Proceedings of the 2011 ACM SIGPLAN Conference on Programming Language Design and Implementation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Martin C. Rinard and Pedro C. Diniz. 1997. Commutativity analysis: A new analysis technique for parallelizing compilers. ACM Transactions on Programming Languages and Systems 19, 6 (November 1997), 942--991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Amitabha Roy, Steven Hand, and Tim Harris. 2009. Exploring the limits of disjoint access parallelism. In Proceedings of the 1st USENIX Workshop on Hot Topics in Parallelism. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Koushik Sen, Darko Marinov, and Gul Agha. 2005. CUTE: A concolic unit testing engine for C. In Proceedings of the 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Marc Shapiro, Nuno Preguica, Carlos Baquero, and Marek Zawirski. 2011a. Conflict-free replicated data types. In Proceedings of the 13th International Conference on Stabilization, Safety, and Security of Distributed Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Marc Shapiro, Nuno Preguica, Carlos Baquero, and Marek Zawirski. 2011b. Convergent and commutative replicated data types. Bulletin of the EATCS 104 (June 2011), 67--88.Google ScholarGoogle Scholar
  57. Guy L. Steele, Jr. 1990. Making asynchronous parallelism safe for the world. In Proceedings of the 17th ACM Symposium on Principles of Programming Languages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Super Micro Computer. 2012. X8OBN-F manual. Retrieved from http://www.supermicro.com/manuals/motherboard/7500/X8OBN-F.pdf.Google ScholarGoogle Scholar
  59. Gil Tene, Balaji Iyengar, and Michael Wolf. 2011. C4: The continuously concurrent compacting collector. SIGPLAN Notices 46, 11 (June 2011), 79--88. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Tyan Computer Corporation. 2006a. M4985 manual. (2006).Google ScholarGoogle Scholar
  61. Tyan Computer Corporation. 2006b. S4985G3NR manual. (2006).Google ScholarGoogle Scholar
  62. R. Unrau, O. Krieger, B. Gamsa, and M. Stumm. 1995. Hierarchical clustering: A structure for scalable multiprocessor operating system design. Journal of Supercomputing 9, 1--2 (March 1995), 105--134. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. W. E. Weihl. 1988. Commutativity-based concurrency control for abstract data types. IEEE Transactions on Computers 37, 12 (December 1988), 1488--1505. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. David Wentzlaff and Anant Agarwal. 2009. Factored operating systems (FOS): The case for a scalable operating system for multicores. ACM SIGOPS Operating System Review 43, 2 (2009), 76--85. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. The Scalable Commutativity Rule: Designing Scalable Software for Multicore Processors

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in

                Full Access

                • Published in

                  cover image ACM Transactions on Computer Systems
                  ACM Transactions on Computer Systems  Volume 32, Issue 4
                  January 2015
                  124 pages
                  ISSN:0734-2071
                  EISSN:1557-7333
                  DOI:10.1145/2723895
                  Issue’s Table of Contents

                  Copyright © 2015 Owner/Author

                  Publisher

                  Association for Computing Machinery

                  New York, NY, United States

                  Publication History

                  • Published: 20 January 2015
                  • Received: 1 October 2014
                  • Accepted: 1 October 2014
                  Published in tocs Volume 32, Issue 4

                  Check for updates

                  Qualifiers

                  • research-article
                  • Research
                  • Refereed

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader
                About Cookies On This Site

                We use cookies to ensure that we give you the best experience on our website.

                Learn more

                Got it!