Abstract
What opportunities for multicore scalability are latent in software interfaces, such as system call APIs? Can scalability challenges and opportunities be identified even before any implementation exists, simply by considering interface specifications? To answer these questions, we introduce the scalable commutativity rule: whenever interface operations commute, they can be implemented in a way that scales. This rule is useful throughout the development process for scalable multicore software, from the interface design through implementation, testing, and evaluation.
This article formalizes the scalable commutativity rule. This requires defining a novel form of commutativity, SIM commutativity, that lets the rule apply even to complex and highly stateful software interfaces.
We also introduce a suite of software development tools based on the rule. Our Commuter tool accepts high-level interface models, generates tests of interface operations that commute and hence could scale, and uses these tests to systematically evaluate the scalability of implementations. We apply Commuter to a model of 18 POSIX file and virtual memory system operations. Using the resulting 26,238 scalability tests, Commuter highlights Linux kernel problems previously observed to limit application scalability and identifies previously unknown bottlenecks that may be triggered by future workloads or hardware.
Finally, we apply the scalable commutativity rule and Commuter to the design and implementation sv6, a new POSIX-like operating system. sv6’s novel file and virtual memory system designs enable it to scale for 99% of the tests generated by Commuter. These results translate to linear scalability on an 80-core x86 machine for applications built on sv6’s commutative operations.
- Advanced Micro Devices. 2012. AMD64 Architecture Programmer’s Manual. Vol. 2. Advanced Micro Devices.Google Scholar
- Jonathan Appavoo, Dilma da Silva, Orran Krieger, Marc Auslander, Michal Ostrowski, Bryan Rosenburg, Amos Waterland, Robert W. Wisniewski, Jimi Xenidis, Michael Stumm, and Livio Soares. 2007. Experience distributing objects in an SMMP OS. ACM Transactions on Computer Systems 25, 3 (Aug. 2007), 1--52. Google Scholar
Digital Library
- Hagit Attiya, Rachid Guerraoui, Danny Hendler, Petr Kuznetsov, Maged M. Michael, and Martin Vechev. 2011. Laws of order: Expensive synchronization in concurrent algorithms cannot be eliminated. In Proceedings of the 38th ACM Symposium on Principles of Programming Languages. Google Scholar
Digital Library
- Hagit Attiya, Eshcar Hillel, and Alessia Milani. 2009. Inherent limitations on disjoint-access parallel implementations of transactional memory. In Proceedings of the 21st Annual ACM Symposium on Parallelism in Algorithms and Architectures. Google Scholar
Digital Library
- Andrew Baumann, Paul Barham, Pierre-Evariste Dagand, Tim Harris, Rebecca Isaacs, Simon Peter, Timothy Roscoe, Adrian Schüpbach, and Akhilesh Singhania. 2009. The multikernel: A new OS architecture for scalable multicore systems. In Proceedings of the 22nd ACM Symposium on Operating Systems Principles (SOSP’09). Google Scholar
Digital Library
- Fabrice Bellard and others. 2011. QEMU. Retrieved August 1, 2014, from http://www.qemu.org/.Google Scholar
- Daniel J. Bernstein. 2007. Some thoughts on security after ten years of qmail 1.0. In Proceedings of the ACM Workshop on Computer Security Architecture. Google Scholar
Digital Library
- Philip A. Bernstein and Nathan Goodman. 1981. Concurrency control in distributed database systems. Computer Surveys 13, 2 (June 1981), 185--221. Google Scholar
Digital Library
- David L. Black, Richard F. Rashid, David B. Golub, Charles R. Hill, and Robert V. Baron. 1989. Translation lookaside buffer consistency: A software approach. In Proceedings of the 3rd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’89). 113--122. Google Scholar
Digital Library
- Silas Boyd-Wickizer. 2014. Optimizing Communication Bottlenecks in Multiprocessor Operating System Kernels. Ph.D. Dissertation. Massachusetts Institute of Technology.Google Scholar
- Silas Boyd-Wickizer, Haibo Chen, Rong Chen, Yandong Mao, M. Frans Kaashoek, Robert Morris, Aleksey Pesterev, Lex Stein, Ming Wu, Yuehua Dai, Yang Zhang, and Zheng Zhang. 2008. Corey: An operating system for many cores. In Proceedings of the 8th Symposium on Operating Systems Design and Implementation (OSDI’08). Google Scholar
Digital Library
- Silas Boyd-Wickizer, Austin Clements, Yandong Mao, Aleksey Pesterev, M. Frans Kaashoek, Robert Morris, and Nickolai Zeldovich. 2010. An analysis of Linux scalability to many cores. In Proceedings of the 9th Symposium on Operating Systems Design and Implementation (OSDI). Vancouver, Canada. Google Scholar
Digital Library
- Cristian Cadar, Daniel Dunbar, and Dawson Engler. 2008. KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs. In Proceedings of the 8th Symposium on Operating Systems Design and Implementation (OSDI’08). Google Scholar
Digital Library
- Cristian Cadar, Vijay Ganesh, Peter M. Pawlowski, David L. Dill, and Dawson R. Engler. 2006. EXE: Automatically generating inputs of death. In Proceedings of the 13th ACM Conference on Computer and Communications Security. Google Scholar
Digital Library
- Bryan Cantrill and Jeff Bonwick. 2008. Real-world concurrency. Communications of the ACM 51, 11 (2008), 34--39. Google Scholar
Digital Library
- Koen Claessen and John Hughes. 2000. QuickCheck: A lightweight tool for random testing of Haskell programs. In Proceedings of the 5th ACM SIGPLAN International Conference on Functional Programming. Google Scholar
Digital Library
- Austin T. Clements. 2014. The Scalable Commutativity Rule: Designing Scalable Software for Multicore Processors. Ph.D. Dissertation. Massachusetts Institute of Technology.Google Scholar
- Austin T. Clements, M. Frans Kaashoek, and Nickolai Zeldovich. 2012. Concurrent address spaces using RCU balanced trees. In Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’12). Google Scholar
Digital Library
- Austin T. Clements, M. Frans Kaashoek, and Nickolai Zeldovich. 2013a. RadixVM: Scalable address spaces for multithreaded applications (revised 2014-08-05). In Proceedings of the ACM EuroSys Conference. Google Scholar
Digital Library
- Austin T. Clements, M. Frans Kaashoek, Nickolai Zeldovich, Robert T. Morris, and Eddie Kohler. 2013b. The scalable commutativity rule: Designing scalable software for multicore processors. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP’13). Google Scholar
Digital Library
- Jonathan Corbet. 2010. The Search for Fast, Scalable Counters. Retrieved August 1, 2014, from http://lwn.net/Articles/170003/.Google Scholar
- Jonathan Corbet. 2012. Dcache scalability and RCU-walk. (April 23, 2012). Retrieved August 1, 2014, from http://lwn.net/Articles/419811/.Google Scholar
- Russ Cox, M. Frans Kaashoek, and Robert T. Morris. 2011. Xv6, a simple Unix-like teaching operating system. (February 2011). Retrieved August 1, 2014, from http://pdos.csail.mit.edu/6.828/xv6/.Google Scholar
- Leonardo de Moura and Nikolaj Bjørner. 2008. Z3: An efficient SMT solver. In Proceedings of the 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems. Google Scholar
Digital Library
- John DeTreville. 1990. Experience with Concurrent Garbage Collectors for Modula-2+. Technical Report 64. DEC Systems Research Center.Google Scholar
- Adam Dunkels and others. 2012. Lightweight IP. Retrieved August 1, 2014, from http://savannah.nongnu.org/projects/lwip/.Google Scholar
- DWARF Debugging Information Format Committee. 2010. DWARF debugging information format, version 4. Retrieved from http://www.dwarfstd.org/doc/DWARF4.pdf.Google Scholar
- Faith Ellen, Yossi Lev, Victor Luchango, and Mark Moir. 2007. SNZI: Scalable nonzero indicators. In Proceedings of the 26th ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing. Google Scholar
Digital Library
- Jason Evans. 2006. A scalable concurrent malloc (3) implementation for FreeBSD. In Proceedings of the BSDCan Conference. Ottawa, Canada.Google Scholar
- Ben Gamsa, Orran Krieger, Jonathan Appavoo, and Michael Stumm. 1999. Tornado: Maximizing locality and concurrency in a shared memory multiprocessor operating system. In Proceedings of the 3rd Symposium on Operating Systems Design and Implementation (OSDI’99). 87--100. Google Scholar
Digital Library
- Sanjay Ghemawat. 2007. TCMalloc: Thread-caching Malloc. Retrieved from http://gperftools.googlecode.com/svn/trunk/doc/tcmalloc.html.Google Scholar
- Patrice Godefroid, Nils Klarlund, and Koushik Sen. 2005. DART: Directed automated random testing. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation. Google Scholar
Digital Library
- J. R. Goodman and H. H. J. Hum. 2009. MESIF: A Two-Hop Cache Coherency Protocol for Point-to-Point Interconnects. Technical Report. University of Auckland and Intel.Google Scholar
- Maurice Herlihy and Eric Koskinen. 2008. Transactional boosting: A methodology for highly-concurrent transactional objects. In Proceedings of the 13th ACM Symposium on Principles and Practice of Parallel Programming. Google Scholar
Digital Library
- Maurice Herlihy and Nir Shavit. 2008. The Art of Multiprocessor Programming. Morgan Kaufmann. Google Scholar
Digital Library
- Maurice P. Herlihy and Jeannette M. Wing. 1990. Linearizability: A correctness condition for concurrent objects. ACM Transactions on Programming Languages Systems 12, 3 (1990), 463--492. Google Scholar
Digital Library
- David Howells. 2010. Extended File Stat Functions, Linux Patch. Retrieved August 1, 2014, from https://lkml.org/lkml/2010/7/14/539.Google Scholar
- IEEE (The Institute of Electrical and Electronics Engineers) and The Open Group. 2013. The Open Group base specifications issue 7, 2013 edition (POSIX.1-2008/Cor 1-2013). Retrieved from http://pubs.opengroup.org/onlinepubs/9699919799/.Google Scholar
- Intel. 2012. The ACPI Component Architecture Project. Retrieved August 1, 2014, from http://www.acpica.org/.Google Scholar
- Intel. 2013. Intel 64 and IA-32 Architectures Software Developer’s Manual. Vol. 3. Intel Corporation.Google Scholar
- ISO. 2011. ISO/IEC 14882:2011(E): Information technology -- Programming languages -- C++. Geneva, Switzerland.Google Scholar
- Amos Israeli and Lihu Rappoport. 1994. Disjoint-access-parallel implementations of strong shared memory primitives. In Proceedings of the 13th ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing. Google Scholar
Digital Library
- Pieter Koopman, Artem Alimarine, Jan Tretmans, and Rinus Plasmeijer. 2002. Gast: Generic automated software testing. In Proceedings of the 14th International Workshop on the Implementation of Functional Languages. Google Scholar
Digital Library
- Christoph Lameter. 2005. Effective synchronization on Linux/NUMA systems. In Gelato Conference. Retrieved August 1, 2014, from http://lameter.com/gelato2005.pdf.Google Scholar
- Ran Liu and Haibo Chen. 2012. SSMalloc: A low-latency, locality-conscious memory allocator with stable performance scalability. In Proceedings of the 3rd Asia-Pacific Workshop on Systems. Google Scholar
Digital Library
- Paul E. McKenney. 1999. Differential profiling. Software: Practice and Experience 29, 3 (1999), 219--234. Google Scholar
Digital Library
- Paul E. McKenney. 2011. Concurrent Code and Expensive Instructions. Retrieved August 1, 2014, from https://lwn.net/Articles/423994/.Google Scholar
- Paul E. McKenney, Dipankar Sarma, Andrea Arcangeli, Andi Kleen, Orran Krieger, and Rusty Russell. 2002. Read-copy update. In Proceedings of the Linux Symposium.Google Scholar
- John M. Mellor-Crummey and Michael L. Scott. 1991. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Transactions on Computer Systems 9, 1 (1991), 21--65. Google Scholar
Digital Library
- Mark S. Papamarcos and Janak H. Patel. 1984. A low-overhead coherence solution for multiprocessors with private cache memories. In Proceedings of the 11th Annual International Symposium on Computer Architecture. Google Scholar
Digital Library
- Prakash Prabhu, Soumyadeep Ghosh, Yun Zhang, Nick P. Johnson, and David I. August. 2011. Commutative set: A language extension for implicit parallel programming. In Proceedings of the 2011 ACM SIGPLAN Conference on Programming Language Design and Implementation. Google Scholar
Digital Library
- Martin C. Rinard and Pedro C. Diniz. 1997. Commutativity analysis: A new analysis technique for parallelizing compilers. ACM Transactions on Programming Languages and Systems 19, 6 (November 1997), 942--991. Google Scholar
Digital Library
- Amitabha Roy, Steven Hand, and Tim Harris. 2009. Exploring the limits of disjoint access parallelism. In Proceedings of the 1st USENIX Workshop on Hot Topics in Parallelism. Google Scholar
Digital Library
- Koushik Sen, Darko Marinov, and Gul Agha. 2005. CUTE: A concolic unit testing engine for C. In Proceedings of the 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering. Google Scholar
Digital Library
- Marc Shapiro, Nuno Preguica, Carlos Baquero, and Marek Zawirski. 2011a. Conflict-free replicated data types. In Proceedings of the 13th International Conference on Stabilization, Safety, and Security of Distributed Systems. Google Scholar
Digital Library
- Marc Shapiro, Nuno Preguica, Carlos Baquero, and Marek Zawirski. 2011b. Convergent and commutative replicated data types. Bulletin of the EATCS 104 (June 2011), 67--88.Google Scholar
- Guy L. Steele, Jr. 1990. Making asynchronous parallelism safe for the world. In Proceedings of the 17th ACM Symposium on Principles of Programming Languages. Google Scholar
Digital Library
- Super Micro Computer. 2012. X8OBN-F manual. Retrieved from http://www.supermicro.com/manuals/motherboard/7500/X8OBN-F.pdf.Google Scholar
- Gil Tene, Balaji Iyengar, and Michael Wolf. 2011. C4: The continuously concurrent compacting collector. SIGPLAN Notices 46, 11 (June 2011), 79--88. Google Scholar
Digital Library
- Tyan Computer Corporation. 2006a. M4985 manual. (2006).Google Scholar
- Tyan Computer Corporation. 2006b. S4985G3NR manual. (2006).Google Scholar
- R. Unrau, O. Krieger, B. Gamsa, and M. Stumm. 1995. Hierarchical clustering: A structure for scalable multiprocessor operating system design. Journal of Supercomputing 9, 1--2 (March 1995), 105--134. Google Scholar
Digital Library
- W. E. Weihl. 1988. Commutativity-based concurrency control for abstract data types. IEEE Transactions on Computers 37, 12 (December 1988), 1488--1505. Google Scholar
Digital Library
- David Wentzlaff and Anant Agarwal. 2009. Factored operating systems (FOS): The case for a scalable operating system for multicores. ACM SIGOPS Operating System Review 43, 2 (2009), 76--85. Google Scholar
Digital Library
Index Terms
The Scalable Commutativity Rule: Designing Scalable Software for Multicore Processors
Recommendations
The scalable commutativity rule: designing scalable software for multicore processors
SOSP '13: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems PrinciplesWhat fundamental opportunities for scalability are latent in interfaces, such as system call APIs? Can scalability opportunities be identified even before any implementation exists, simply by considering interface specifications? To answer these ...
The scalable commutativity rule: designing scalable software for multicore processors
Developing software that scales on multicore processors is an inexact science dominated by guesswork, measurement, and expensive cycles of redesign and reimplementation. Current approaches are workload-driven and, hence, can reveal scalability ...
The property of commutativity for some generalizations of BCK algebras
We consider thirty generalizations of BCK algebras (RM, RML, BCH, BCC, BZ, BCI algebras and many others). We investigate the property of commutativity for these algebras. We also give 10 examples of proper commutative finite algebras. Moreover, we ...






Comments