skip to main content
10.1145/1736020.1736035acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

Decoupling contention management from scheduling

Published:13 March 2010Publication History

ABSTRACT

Many parallel applications exhibit unpredictable communication between threads, leading to contention for shared objects. The choice of contention management strategy impacts strongly the performance and scalability of these applications: spinning provides maximum performance but wastes significant processor resources, while blocking-based approaches conserve processor resources but introduce high overheads on the critical path of computation. Under situations of high or changing load, the operating system complicates matters further with arbitrary scheduling decisions which often preempt lock holders, leading to long serialization delays until the preempted thread resumes execution.

We observe that contention management is orthogonal to the problems of scheduling and load management and propose to decouple them so each may be solved independently and effectively. To this end, we propose a load control mechanism which manages the number of active threads in the system separately from any contention which may exist. By isolating contention management from damaging interactions with the OS scheduler, we combine the efficiency of spinning with the robustness of blocking. The proposed load control mechanism results in stable, high performance for both lightly and heavily loaded systems, requires no special privileges or modifications at the OS level, and can be implemented as a library which benefits existing code.

References

  1. A. Agarwal and M. Cherian. Adaptive backoff synchronization techniques. In Proc. ISCA (1989), pp. 396--406. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. T. Anderson, B. Bershad, E. Lazowska, and H. Levy. Scheduler activations: effective kernel support for the user-level management of parallelism. In ACM Transactions on Computer Systems (TOCS) 10,2 (Feb 1992), pp. 53--79. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. N. Bartolini, G. Bongiovanni, Simone Silvestri. Self-* through self-learning: Overload control for distributed web systems. In International Journal of Computer and Telecommunications Networking 53,5 (Apr 2009), pp. 727-743. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. C. Bienia, S. Kumar, J. Singh, and K. Li. The PARSEC Benchmark Suite: Characterization and Architectural Implications. In Proc. PACT (Oct 2008). Source package available at http://parsec.cs.princeton.edu. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Blasgen, J. Gray, M. Mitoma, and T. Price. The convoy phenomenon. ACM SIGOPS Operating Systems Review 13,2 (1979) pp. 20--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. L. Boguslavsky, K. Harzallah, A. Kreinen, K. Sevcik, A. Vainshtein. Optimal strategies for spinning and blocking. Journal of Parallel and Distributed Computing 21,2 (May 1994), pp. 246-254. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. B. Cantrill, M. Shapiro, and A. Leventhal. 2004. Dynamic instrumentation of production systems. In Proc. Usenix Annual Technical Conference, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Carey, S. Krishnamurthi, and M. Livny. Load control for locking: the "half-and-half" approach. In Proc. Symposium on Principles of Database Systems (PODS) (Apr 1990). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Carlstrom and R. Rom. Application aware admission control and scheduling in web servers. In Proc. IEEE INFOCOM (2002).Google ScholarGoogle ScholarCross RefCross Ref
  10. D. Dice and N. Shavit. What really makes transactions fast? In Proc. ACM SIGPLAN Workshop on Languages, Compilers, and Hardware Support for Transactional Compouting (Transact) (Jun 2006).Google ScholarGoogle Scholar
  11. A. Dragojevi, R. Guerraoui, and M. Kapalka. Dividing Transactional Memories by Zero. In Proc. Transact (2008, Salt Lake City, UT).Google ScholarGoogle Scholar
  12. H. Franke, R. Russell, M. K. Fuss. Futexes and furwocks: Fast userlevel locking in linux. In Proc. 2002 Ottawa Linux Summit (2002).Google ScholarGoogle Scholar
  13. G. Franklin, J. Powell, and A. Emami-Naeini. Feedback control of dynamic systems, 4th edition. Prentice Hall, NJ, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Gupta, A. Tucker, and S. Urushibara. The impact of operating scheduling policies and synchronization methods on the performance of parallel applications. In Proc. ACM SIGMETRICS Conference on Measuring and Modeling Computer Systems (May 1991). Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. B. He, W. N. Scherer III, and M. L. Scott. Preemption adaptivity in time-published queue-based spin locks. In Proc. High Performance Computing (HiPC) (2005). Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Herlihy. Wait-free synchronization. ACM Trans. on Programming Languages and Systems (TOPLAS) 13,1 (Jan 1991), pp. 124--149. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Herlihy and J. Moss. Transactional memory: architectural support for lock-free data structures. ACM SIGARCH Computer Architecture News 21, 2 (May 1993), pp. 289--300. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. IBM. Telecom Application Transaction Processing (TATP) Benchmark Description. Available online at http://tatpbenchmark.sourceforge.net/TATP_Description.pdf.Google ScholarGoogle Scholar
  19. R. Johnson, M. Athannassoulis, R. Stoica, and A. Ailamaki. A new look at the roles of spinning and blocking. In Proc. ACM SIGMOD Workshop on Data Management on New Hardware (DaMoN) (Jul 2009, Providence, RI). Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. R. Johnson, I. Pandis, A. Ailamaki, and B. Falsafi. Shore-MT: a scalable storage manager for the multicore era. In Proc EDBT'09 (Mar 2009, St.Petersburg). Source code and benchmark kit available at http://diaswww.epfl.ch/shore-mt/ Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. R. Kalman. A New Approach to Linear Filtering and Prediction Problems. Transactions of the ASME -- Journal of Basic Engineering 82,D (1960), pp. 35--45.Google ScholarGoogle Scholar
  22. P. Magnussen, A. Landin, and E. Hagersten. Queue locks on cache coherent multiprocessors. In Proc. International Symposium on Parallel Processing (Apr. 1994), pp. 165-171. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Mauro and R. McDougall. Solaris Internals: Core Kernel Components. Sun Microsystems Press (2001). Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. M. Mellor-Crummey, M. L. Scott, Algorithms for scalable synchronization on shared-memory multiprocessors, ACM TOCS 9,1 (Feb 1991), p.21--65. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. A. Monkeberg, G. Weikum. Performance evaluation of an adaptive and robust load control method for the avoidance of data contention thrashing. In Proc. Very Large Databases (VLDB) (1992). Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Nokia. Network Database Benchmark. Specification and reference implementation available online at http://hoslab.cs.helsinki.fi/homepages/ndbbenchmark/Google ScholarGoogle Scholar
  27. J. K. Ousterhout. Scheduling techniques for concurrent systems. In Proc. Conf. on Dist. Computing Systems (1982).Google ScholarGoogle Scholar
  28. I. Pandis, R. Johnson, N. Hardavellas, and A. Ailamaki. Data-Oriented Transaction Execution. To appear, Proc. of the VLDB Endowment 3,1 (Aug 2010). Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. D. P. Reed and R. K. Kanodia. Synchronization with Eventcounts and Sequencers. Communications of the ACM, 22(2):115--23 (Feb. 1979). Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. N. Shavit and D. Touitou. Software transactional memory. In Proc ACM Symposium on Principles of Distributed Computing (PODC) (Aug 1995), pp. 204--213. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Transaction Processing Council (TPC). TPC benchmark C (OLTP) standard specification, revision 5.9. Available online at http://www.tpc.org/tpcc/spec/tpcc_current.pdf.Google ScholarGoogle Scholar
  32. M. Welsh, D. Culler, and E. Brewer. SEDA: an architecture for well-conditioned, scalable internet services. In Proc. Symposium on operating systems principles (SOSP) (Dec 2001). Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. S. Woo, M. Ohara, E. Torrie, J. Singh, and A. Gupta. The SPLASH-2 programs: characterization and methodological considerations. In Proc ISCA (Jun 1995), pp. 24-38 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Decoupling contention management from scheduling

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!