skip to main content
10.1145/1007912.1007944acmconferencesArticle/Chapter ViewAbstractPublication PagesspaaConference Proceedingsconference-collections
Article

A scalable lock-free stack algorithm

Published: 27 June 2004 Publication History

Abstract

The literature describes two high performance concurrent stack algorithms based on combining funnels and elimination trees. Unfortunately, the funnels are linearizable but blocking, and the elimination trees are non-blocking but not linearizable. Neither is used in practice since they perform well only at exceptionally high loads. The literature also describes a simple lock-free linearizable stack algorithm that works at low loads but does not scale as the load increases. The question of designing a stack algorithm that is non-blocking, linearizable, and scales well throughout the concurrency range, has thus remained open.This paper presents such a concurrent stack algorithm. It is based on the following simple observation: that a single elimination array used as a backoff scheme for a simple lock-free stack is lock-free, linearizable, and scalable. As our empirical results show, the resulting elimination-backoff stack performs as well as the simple stack at low loads, and increasingly outperforms all other methods (lock-based and non-blocking) as concurrency increases. We believe its simplicity and scalability make it a viable practical alternative to existing constructions for implementing concurrent stacks.

References

[1]
A. Agarwal and M. Cherian. Adaptive backoff synchronization techniques. In Proceedings of the 16th Symposium on Computer Architecture, pages 41--55, June 1989.
[2]
T. E. Anderson. The performance of spin lock alternatives for shared-memory multiprocessors. IEEE Transactions on Parallel and Distributed Systems, 1(1):6--16, January 1990.
[3]
T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms, Second Edition. MIT Press, Cambridge, Massachusetts, 2002.
[4]
I. Corporation. IBM System/370 Extended Architecture, Principles of Operation. IBM Publication No. SA22-7085, 1983.
[5]
R. Goodman and M. K. V. P. J. Woest. Efficient synchronisation primitives for large-scale cache-coherent multiprocessors. In Proceedings of the Third International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS-III, pages 6475, 1989.
[6]
A. Gottleib, B. D. Lubachevsky, and L. Rudolph. Efficient techniques for coordinating sequential processors. ACM TOPLAS, 5(2):164--189, April 1983.
[7]
M. Greenwald. Non-Blocking Synchronization and System Design. PhD thesis, Stanford University Technical Report STAN-CS-TR-99-1624, Palo Alto, CA, 8 1999.
[8]
M. Herlihy. A methodology for implementing highly concurrent data objects. ACM Transactions on Programming Languages and Systems, 15(5):745--770, November 1993. large commercial applications. In SOSP, Work-in-progress talk, 2001.
[9]
M. Herlihy, B.-H. Lim, and N. Shavit. Scalable concurrent counting. ACM Transactions on Computer Systems, 13(4):343364, 1995.
[10]
M. Herlihy, V. Luchangco, and M. Moir. The repeat-offender problem, a mechanism for supporting dynamic-sized,lock-free data structures. Technical Report TR-2002-112, Sun Microsystems, September 2002.
[11]
M. P. Herlihy and J. M. Wing. Linearizability: a correctness condition for concurrent objects. ACM Transactions on Programming Languages and Systems (TOPLAS), 12(3):463--492, 1990.
[12]
B.-H. Lim and A. Agarwal. Waiting algorithms for synchronization in large-scale multiprocessors. ACM Transactions on Computer Systems, 11(3):253--294, august 1993.
[13]
J. M. Mellor-Crummey and M. L. Scott. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Transactions on Computer Systems (TOCS), 9(1):2165, 1991.
[14]
M. M. Michael. Safe memory reclamation for dynamic lock-free objects using atomic reads and writes. In Proceedings of the twenty-first annual symposium on Principles of distributed computing, pages 21--30. ACM Press, 2002.
[15]
M. M. Michael and M. L. Scott. Nonblocking algorithms and preemption-safe locking on multiprogrammed shared | memory multiprocessors. Journal of Parallel and Distributed Computing, 51(1):1--26, 1998.
[16]
M. Scott and W. Schrerer. User-level spin locks for large commercial applications. In SOSP, Work-in-progress talk, 2001.
[17]
N. Shavit and D. Touitou. Elimination trees and the construction of pools and stacks. Theory of Computing Systems, (30):645--670, 1997.
[18]
N. Shavit, E. Upfal, and A. Zemach. A steady state analysis of diffracting trees. Theory of Computing Systems, 31(4):403--423, 1998.
[19]
N. Shavit and A. Zemach. Diffracting trees. ACM Transactions on Computer Systems, 14(4):385--428, 1996.
[20]
N. Shavit and A. Zemach. Combining funnels: A dynamic approach to software combining. Journal of Parallel and Distributed Computing, (60):1355--1387, 2000.
[21]
K. Taura, S. Matsuoka, and A. Yonezawa. An efficient implementation scheme of concurrent object-oriented languages on stock multicomputers. In Principles Practice of Parallel Programming, pages 218--228, 1993.
[22]
R. K. Treiber. Systems programming: Coping with parallelism. Technical Report RJ 5118, IBM Almaden Research Center, April 1986.

Cited By

View all
  • (2025)Efficient Concurrent Updates to Persistent Randomized Binary Search TreesProceedings of the VLDB Endowment10.14778/3718057.371807418:5(1481-1494)Online publication date: 27-Aug-2025
  • (2025)Intrathread Method Orders Based Adaptive Testing of Concurrent ObjectsScience of Computer Programming10.1016/j.scico.2025.103362(103362)Online publication date: Jul-2025
  • (2025)Lilo: A Higher-Order, Relational Concurrent Separation Logic for LivenessProceedings of the ACM on Programming Languages10.1145/37205259:OOPSLA1(1267-1294)Online publication date: 9-Apr-2025
  • Show More Cited By

Recommendations

Comments