skip to main content
article

Threads cannot be implemented as a library

Published: 12 June 2005 Publication History

Abstract

In many environments, multi-threaded code is written in a language that was originally designed without thread support (e.g. C), to which a library of threading primitives was subsequently added. There appears to be a general understanding that this is not the right approach. We provide specific arguments that a pure library approach, in which the compiler is designed independently of threading issues, cannot guarantee correctness of the resulting code.We first review why the approach almost works, and then examine some of the surprising behavior it may entail. We further illustrate that there are very simple cases in which a pure library-based approach seems incapable of expressing an efficient parallel algorithm.Our discussion takes place in the context of C with Pthreads, since it is commonly used, reasonably well specified, and does not attempt to ensure type-safety, which would entail even stronger constraints. The issues we raise are not specific to that context.

References

[1]
A. Alexandrescu, H.-J. Boehm, K. Henney, B. Hutchings, D. Lea, and B. Pugh. Memory model for multithreaded C++: Issues. http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2005/n1777.pdf.
[2]
A. Alexandrescu, H.-J. Boehm, K. Henney, D. Lea, and B. Pugh. Memory model for multithreaded C++. http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2004/n1680.pdf.
[3]
M. Auslander and M. Hopkins. An overview of the PL.8 compiler. In Proceedings of the 1982 SIGPLAN Symposium on Compiler Construction, pages 22--31, 1982.
[4]
A. Bechini, P. Foglia, and C. A. Prete. Fine-grain design space exploration for a cartographic SoC multiprocessor. ACM SIGARCH Computer Architecture News (MEDEA Workshop), 31(1):85--92, March 2003.
[5]
B. N. Bershad, D. D. Redell, and J. R. Ellis. Fast mutual exclusion for uniprocessors. In ASPLOS-V: Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 223--233, October 1992.
[6]
H.-J. Boehm. A garbage collector for C and C++. http://www.hpl.hp.com/personal/Hans_Boehm/gc/.
[7]
H.-J. Boehm. An almost non-blocking stack. In Proceedings of the Twenty-third Annual ACM Symposium on Principles of Distributed Computing, pages 40--49, July 2004.
[8]
P. A. Buhr. Are safe concurrency libraries possible. Communications of the ACM, 38(2):117--120, February 1995.
[9]
J. D. Collins, H. Wang, D. M. Tullsen, C. Hughes, Y.-F. Lee, D. Lavery, and J. P. Shen. Speculative precomputation: Long-range prefetching of delinquent loads. In Proceedings of the 28th International Symposium on Computer Architecture, pages 14--15, 2001.
[10]
K. D. Cooper and J. Lu. Register promotion in c programs. In Proceedings of the ACM SIGPLAN 1997 Conference on Programming Language Design and Implementation, pages 308--319, 1997.
[11]
Ericsson Computer Science Laboratory. Open source Erlang. http://www.erlang.org.
[12]
M. Herlihy. Wait-free synchronization. ACM Transactions on Programming Languages and Systems, 13(1):123--149, 1991.
[13]
M. Herlihy. A methodology for implementing highly concurrent data structures. ACM Transactions on Programming Languages and Systems, 15(5):745--770, 1993.
[14]
M. Herlihy, V. Luchangco, and M. Moir. Obstruction-free synchronization: Double-ended queues as an example. In Proc. 23rd International Conference on Distributed Computing Systems (ICDCS), pages 522--529, 2003.
[15]
HP Technical Brief. Memory ordering optimization considerations. http://h21007.www2.hp.com/dspp/files/unprotected/ddk/Optmiztn.pdf.
[16]
IEEE and The Open Group. IEEE Standard 1003.1-2001. IEEE, 2001.
[17]
JSR 133 Expert Group. Jsr-133: Java memory model and thread specification. http://www.cs.umd.edu~pugh/java/memoryModel/jsr133.pdf, August 2004.
[18]
P. Keleher, A. L. Cox, and W. Zwaenepoel. Lazy release consistency for software distributed shared memory. In Proceedings of the 19th Annual Symposium on Computer Architecture (ISCA'92), pages 13--21, May 1992.
[19]
L. Lamport. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Transactions on Computing, C-28(9):690--691, 1979.
[20]
D. Lea. Concurrency jsr-166 interest site. http://gee.cs.oswego.edu/dl/concurrency-interest.
[21]
D. Lea. The JSR-133 cookbook for compiler writers. http://gee.cs.oswego.edu/dl/jmm/cookbook.html.
[22]
R. Lo, F. Chow, R. Kennedy, S.-M. Liu, and P. Tu. Register promotion by sparse partial redundancy elimination of loads and stores. In Proceedings of the ACM SIGPLAN 1998 Conference on Programming Language Design and Implementation, pages 26--37, 1998.
[23]
J. Manson, W. Pugh, and S. Adve. The java memory model. In Conference Record of the Thirty-Second Annual ACM Symposium on Principles of Programming Languages, pages 378--391, January 2005.
[24]
M. M. Michael. Scalable lock-free dynamic memory allocation. In Proceedings of the ACM SIGPLAN 2004 Conference on Programming Language Design and Implementation, pages 35--46, 2004.
[25]
B. Pugh. The "double-checked locking is broken" declaration. http://www.cs.umd.edu~pugh/java/memoryModel/DoubleCheckedLocking.html.
[26]
B. Pugh. The java memory model. http://www.cs.umd.edu/~pugh/java/memoryModel/.
[27]
W. Pugh. The java memory model is fatally flawed. Concurrency - Practice and Experience, 12(6):445--455, 2000.
[28]
J. H. Reppy. Cml: A higher-order concurrent language. In Proceedings of the ACM SIGPLAN 1991 Conference on Programming Language Design and Implementation, pages 293--305, 1991.
[29]
J. L. Rosenfield. A case study in programming for parallel processors. Communications of the ACM, 12(12):645--655, December 1969.
[30]
V. Sarkar. Determining average program execution times and their variance. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, Portland, Oregon, January 1989.
[31]
A. V. S. Sastry and R. D. C. Ju. A new algorithm for scalar register promotion based on ssa form. In Proceedings of the ACM SIGPLAN 1998 Conference on Programming Language Design and Implementation, pages 15--25, 1998.
[32]
N. Shavit and D. Touitou. Software transactional memory. In Proceedings of the Fourteenth Annual ACM Symposium on Principles of Distributed Computing, pages 204--213, 1995.
[33]
A. Terekhov and D. Butenhof. The austin common standards revision group: Enhancement request 9 (austin/107): Clarification of "memory location". http://www.opengroup.org/austin/docs/austin_107.txt, May 2002.
[34]
The MPI Forum. The message passing interface (MPI) standard. http://www-unix.mcs.anl.gov/mpi/.
[35]
R. Treiber. Systems programming: Coping with parallelism. Technical Report RJ5118, IBM Almaden Research Center, 1986.
[36]
Y. Wu and J. R. Larus. Static branch frequency and program profile analysis. In Proceedings of the 27th Annual International Symposium on Microarchitecture, pages 1--11, 1994.

Cited By

View all
  • (2023)OpenCilkProceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3572848.3577509(189-203)Online publication date: 25-Feb-2023
  • (2021)Nowa: A Wait-Free Continuation-Stealing Concurrency Platform2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS49936.2021.00044(360-371)Online publication date: May-2021
  • (2020)Thread Evolution Kit for Optimizing Thread Operations on CE/IoT DevicesIEEE Transactions on Consumer Electronics10.1109/TCE.2020.303332866:4(289-298)Online publication date: Nov-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 40, Issue 6
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
June 2005
325 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/1064978
Issue’s Table of Contents
  • cover image ACM Conferences
    PLDI '05: Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
    June 2005
    338 pages
    ISBN:1595930566
    DOI:10.1145/1065010
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 June 2005
Published in SIGPLAN Volume 40, Issue 6

Check for updates

Author Tags

  1. data race
  2. optimization
  3. pthreads
  4. register promotion
  5. threads

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)188
  • Downloads (Last 6 weeks)32
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)OpenCilkProceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3572848.3577509(189-203)Online publication date: 25-Feb-2023
  • (2021)Nowa: A Wait-Free Continuation-Stealing Concurrency Platform2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS49936.2021.00044(360-371)Online publication date: May-2021
  • (2020)Thread Evolution Kit for Optimizing Thread Operations on CE/IoT DevicesIEEE Transactions on Consumer Electronics10.1109/TCE.2020.303332866:4(289-298)Online publication date: Nov-2020
  • (2020)JAMPI: A C++ Parallel Programming Interface Allowing the Implementation of Custom and Generic Scheduling Mechanisms2020 IEEE 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)10.1109/SBAC-PAD49847.2020.00045(273-280)Online publication date: Sep-2020
  • (2020)Design-space evaluation for non-blocking synchronization in Ada: lock elision of protected objects, concurrent objects, and low-level atomicsJournal of Systems Architecture10.1016/j.sysarc.2020.101764110(101764)Online publication date: Nov-2020
  • (2020)Advanced control‐flow and concurrency in C∀Software: Practice and Experience10.1002/spe.292551:5(1005-1042)Online publication date: Dec-2020
  • (2018)Safe Non-blocking Synchronization in Ada2xReliable Software Technologies – Ada-Europe 201810.1007/978-3-319-92432-8_4(53-69)Online publication date: 18-Jun-2018
  • (2017)ReferencesPhysically Based Rendering10.1016/B978-0-12-800645-0.50027-0(1165-1211)Online publication date: 2017
  • (2016)A memory model for X10Proceedings of the 6th ACM SIGPLAN Workshop on X1010.1145/2931028.2931031(7-12)Online publication date: 2-Jun-2016
  • (2016)Splash-3: A properly synchronized benchmark suite for contemporary research2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS.2016.7482078(101-111)Online publication date: Apr-2016
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media