ABSTRACT
Many parallel applications exhibit unpredictable communication between threads, leading to contention for shared objects. The choice of contention management strategy impacts strongly the performance and scalability of these applications: spinning provides maximum performance but wastes significant processor resources, while blocking-based approaches conserve processor resources but introduce high overheads on the critical path of computation. Under situations of high or changing load, the operating system complicates matters further with arbitrary scheduling decisions which often preempt lock holders, leading to long serialization delays until the preempted thread resumes execution.
We observe that contention management is orthogonal to the problems of scheduling and load management and propose to decouple them so each may be solved independently and effectively. To this end, we propose a load control mechanism which manages the number of active threads in the system separately from any contention which may exist. By isolating contention management from damaging interactions with the OS scheduler, we combine the efficiency of spinning with the robustness of blocking. The proposed load control mechanism results in stable, high performance for both lightly and heavily loaded systems, requires no special privileges or modifications at the OS level, and can be implemented as a library which benefits existing code.
- A. Agarwal and M. Cherian. Adaptive backoff synchronization techniques. In Proc. ISCA (1989), pp. 396--406. Google Scholar
Digital Library
- T. Anderson, B. Bershad, E. Lazowska, and H. Levy. Scheduler activations: effective kernel support for the user-level management of parallelism. In ACM Transactions on Computer Systems (TOCS) 10,2 (Feb 1992), pp. 53--79. Google Scholar
Digital Library
- N. Bartolini, G. Bongiovanni, Simone Silvestri. Self-* through self-learning: Overload control for distributed web systems. In International Journal of Computer and Telecommunications Networking 53,5 (Apr 2009), pp. 727-743. Google Scholar
Digital Library
- C. Bienia, S. Kumar, J. Singh, and K. Li. The PARSEC Benchmark Suite: Characterization and Architectural Implications. In Proc. PACT (Oct 2008). Source package available at http://parsec.cs.princeton.edu. Google Scholar
Digital Library
- M. Blasgen, J. Gray, M. Mitoma, and T. Price. The convoy phenomenon. ACM SIGOPS Operating Systems Review 13,2 (1979) pp. 20--25. Google Scholar
Digital Library
- L. Boguslavsky, K. Harzallah, A. Kreinen, K. Sevcik, A. Vainshtein. Optimal strategies for spinning and blocking. Journal of Parallel and Distributed Computing 21,2 (May 1994), pp. 246-254. Google Scholar
Digital Library
- B. Cantrill, M. Shapiro, and A. Leventhal. 2004. Dynamic instrumentation of production systems. In Proc. Usenix Annual Technical Conference, 2004. Google Scholar
Digital Library
- M. Carey, S. Krishnamurthi, and M. Livny. Load control for locking: the "half-and-half" approach. In Proc. Symposium on Principles of Database Systems (PODS) (Apr 1990). Google Scholar
Digital Library
- J. Carlstrom and R. Rom. Application aware admission control and scheduling in web servers. In Proc. IEEE INFOCOM (2002).Google Scholar
Cross Ref
- D. Dice and N. Shavit. What really makes transactions fast? In Proc. ACM SIGPLAN Workshop on Languages, Compilers, and Hardware Support for Transactional Compouting (Transact) (Jun 2006).Google Scholar
- A. Dragojevi, R. Guerraoui, and M. Kapalka. Dividing Transactional Memories by Zero. In Proc. Transact (2008, Salt Lake City, UT).Google Scholar
- H. Franke, R. Russell, M. K. Fuss. Futexes and furwocks: Fast userlevel locking in linux. In Proc. 2002 Ottawa Linux Summit (2002).Google Scholar
- G. Franklin, J. Powell, and A. Emami-Naeini. Feedback control of dynamic systems, 4th edition. Prentice Hall, NJ, USA. Google Scholar
Digital Library
- A. Gupta, A. Tucker, and S. Urushibara. The impact of operating scheduling policies and synchronization methods on the performance of parallel applications. In Proc. ACM SIGMETRICS Conference on Measuring and Modeling Computer Systems (May 1991). Google Scholar
Digital Library
- B. He, W. N. Scherer III, and M. L. Scott. Preemption adaptivity in time-published queue-based spin locks. In Proc. High Performance Computing (HiPC) (2005). Google Scholar
Digital Library
- M. Herlihy. Wait-free synchronization. ACM Trans. on Programming Languages and Systems (TOPLAS) 13,1 (Jan 1991), pp. 124--149. Google Scholar
Digital Library
- M. Herlihy and J. Moss. Transactional memory: architectural support for lock-free data structures. ACM SIGARCH Computer Architecture News 21, 2 (May 1993), pp. 289--300. Google Scholar
Digital Library
- IBM. Telecom Application Transaction Processing (TATP) Benchmark Description. Available online at http://tatpbenchmark.sourceforge.net/TATP_Description.pdf.Google Scholar
- R. Johnson, M. Athannassoulis, R. Stoica, and A. Ailamaki. A new look at the roles of spinning and blocking. In Proc. ACM SIGMOD Workshop on Data Management on New Hardware (DaMoN) (Jul 2009, Providence, RI). Google Scholar
Digital Library
- R. Johnson, I. Pandis, A. Ailamaki, and B. Falsafi. Shore-MT: a scalable storage manager for the multicore era. In Proc EDBT'09 (Mar 2009, St.Petersburg). Source code and benchmark kit available at http://diaswww.epfl.ch/shore-mt/ Google Scholar
Digital Library
- R. Kalman. A New Approach to Linear Filtering and Prediction Problems. Transactions of the ASME -- Journal of Basic Engineering 82,D (1960), pp. 35--45.Google Scholar
- P. Magnussen, A. Landin, and E. Hagersten. Queue locks on cache coherent multiprocessors. In Proc. International Symposium on Parallel Processing (Apr. 1994), pp. 165-171. Google Scholar
Digital Library
- J. Mauro and R. McDougall. Solaris Internals: Core Kernel Components. Sun Microsystems Press (2001). Google Scholar
Digital Library
- J. M. Mellor-Crummey, M. L. Scott, Algorithms for scalable synchronization on shared-memory multiprocessors, ACM TOCS 9,1 (Feb 1991), p.21--65. Google Scholar
Digital Library
- A. Monkeberg, G. Weikum. Performance evaluation of an adaptive and robust load control method for the avoidance of data contention thrashing. In Proc. Very Large Databases (VLDB) (1992). Google Scholar
Digital Library
- Nokia. Network Database Benchmark. Specification and reference implementation available online at http://hoslab.cs.helsinki.fi/homepages/ndbbenchmark/Google Scholar
- J. K. Ousterhout. Scheduling techniques for concurrent systems. In Proc. Conf. on Dist. Computing Systems (1982).Google Scholar
- I. Pandis, R. Johnson, N. Hardavellas, and A. Ailamaki. Data-Oriented Transaction Execution. To appear, Proc. of the VLDB Endowment 3,1 (Aug 2010). Google Scholar
Digital Library
- D. P. Reed and R. K. Kanodia. Synchronization with Eventcounts and Sequencers. Communications of the ACM, 22(2):115--23 (Feb. 1979). Google Scholar
Digital Library
- N. Shavit and D. Touitou. Software transactional memory. In Proc ACM Symposium on Principles of Distributed Computing (PODC) (Aug 1995), pp. 204--213. Google Scholar
Digital Library
- Transaction Processing Council (TPC). TPC benchmark C (OLTP) standard specification, revision 5.9. Available online at http://www.tpc.org/tpcc/spec/tpcc_current.pdf.Google Scholar
- M. Welsh, D. Culler, and E. Brewer. SEDA: an architecture for well-conditioned, scalable internet services. In Proc. Symposium on operating systems principles (SOSP) (Dec 2001). Google Scholar
Digital Library
- S. Woo, M. Ohara, E. Torrie, J. Singh, and A. Gupta. The SPLASH-2 programs: characterization and methodological considerations. In Proc ISCA (Jun 1995), pp. 24-38 Google Scholar
Digital Library
Index Terms
Decoupling contention management from scheduling
Recommendations
Decoupling contention management from scheduling
ASPLOS '10Many parallel applications exhibit unpredictable communication between threads, leading to contention for shared objects. The choice of contention management strategy impacts strongly the performance and scalability of these applications: spinning ...
Decoupling contention management from scheduling
ASPLOS '10Many parallel applications exhibit unpredictable communication between threads, leading to contention for shared objects. The choice of contention management strategy impacts strongly the performance and scalability of these applications: spinning ...
Malthusian Locks
EuroSys '17: Proceedings of the Twelfth European Conference on Computer SystemsApplications running in modern multithreaded environments are sometimes overthreaded. The excess threads do not improve performance, and in fact may act to degrade performance via scalability collapse, which can manifest even when there are fewer ready ...








Comments