Abstract
Previous work has shown how to insert fences that enforce sequential consistency. However, for many concurrent algorithms, sequential consistency is unnecessarily strong and can lead to high execution overhead. The reason is that, often, correctness relies on the execution order of a few specific pairs of instructions. Algorithm designers can declare those execution orders and thereby enable memory-model-independent reasoning about correctness and also ease implementation of algorithms on multiple platforms. The literature has examples of such reasoning, while tool support for enforcing the orders has been lacking until now. In this paper we present a declarative approach to specify and enforce execution orders. Our fence insertion algorithm first identifies the execution orders that a given memory model enforces automatically, and then inserts fences that enforce the rest. Our benchmarks include three off-the-shelf transactional memory algorithms written in C/C++ for which we specify suitable execution orders. For those benchmarks, our experiments with the x86 and ARMv7 memory models show that our tool inserts fences that are competitive with those inserted by the original authors. Our tool is the first to insert fences into transactional memory algorithms and it solves the long-standing problem of how to easily port such algorithms to a novel memory model.
Supplemental Material
Available for Download
This archive includes a repository snap shot of Parry, the tool described in our paper. The README.md contains extensive instructions on how to use the tool and reproduce the results found in the paper. Question, concerns and bugs can be posted at: https://bitbucket.org/ucla-pls/parry/issues
- GNU GCC 4.8.2. Built-in functions for atomic memory access. https://gcc.gnu.org/onlinedocs/gcc-4.8. 0/gcc/_005f_005fsync-Builtins.html#g_t_005f_ 005fsync-Builtins, 2013. {Online, accessed Feb 2015}.Google Scholar
- Jade Alglave, Daniel Kroening, Vincent Nimal, and Daniel Poetzl. Don’t sit on the fence: A static analysis approach to automatic fence insertion. In CAV, 2014.Google Scholar
- ARM. Arm compiler toolchain assembler reference. http://infocenter.arm.com/help/index.jsp? topic=/com.arm.doc.dui0489c/CIHGHHIE.html, 2011. {Online, accessed Feb 2015}.Google Scholar
- John Bender. Parry. https://bitbucket.org/ucla-pls/ parry, 2015.Google Scholar
- Joseph Cheriyan, Howard Karloff, and Yuval Rabani. Approximating directed multicuts. In FOCS, pages 320–328, 2001. Google Scholar
Digital Library
- Intel Corp. Intel 64 and ia-32 architectures software developers manual. http://www.intel.com/content/ dam/www/public/us/en/documents/manuals/ 64-ia-32-architectures-software-developermanual-325462.pdf, 2015. {Online, accessed Feb 2015}.Google Scholar
- Marie-Christine Costa, Lucas Létocart, and Frédéric Roupin. Minimal Multicut and Maximal Integer Multiflow: A Survey. European Journal of Operational Research, 162:55–69, 2005.Google Scholar
Cross Ref
- Heming Cui, Jiri Simsa, Yi-Hong Lin, Hao Li, Ben Blum, Xinan Xu, Junfeng Yang, Garth A. Gibson, and Randal E. Bryant. Parrot: A practical runtime for deterministic, stable, and reliable threads. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, SOSP ’13, pages 388–405, New York, NY, USA, 2013. ACM. Google Scholar
Digital Library
- Andrei Dan, Yuri Meshman, Martin Vechev, and Eran Yahav. Predicate abstraction for relaxed memory models. In SAS, 2013.Google Scholar
Cross Ref
- Chi Cao Minh Dave Dice, Nir Shavit. Tl2 and tl2 eager. https://bitbucket.org/ucla-pls/stamp-tl2-x86/ src/master/tl2.c, 2015. {Online, accessed Feb 2015}.Google Scholar
- Tiago de Paula Peixoto. Graph-tool, efficient network analysis. https://graph-tool.skewed.de/, 2015. {Online, accessed Feb 2015}.Google Scholar
- Dave Dice, Ori Shalev, and Nir Shavit. Transactional locking II. In Distributed Computing, pages 194–208. Springer, 2006. Google Scholar
Digital Library
- Dave Dice and Nir Shavit. Tlrw: Return of the read-write lock. In SPAA, pages 284–293, New York, NY, USA, 2010. ACM. Google Scholar
Digital Library
- David Dice. A race in locksupport park() arising from weak memory models. https://blogs.oracle.com/ dave/entry/a_race_in_locksupport_park, 2009. {Online, accessed Feb 2015}.Google Scholar
- Xing Fang, Jaejin Lee, and Samuel Midkiff. Automatic fence insertion for shared memory multiprocessing. In ICS, 2003. Google Scholar
Digital Library
- GNU. Glpk (gnu linear programming kit). https://www. gnu.org/software/glpk/, 2015.Google Scholar
- {Online, accessed Feb 2015}.Google Scholar
- Richard Grisenthwaite. Barrier litmus tests and cookbook. http://infocenter.arm.com/help/topic/com. arm.doc.genc007826/Barrier_Litmus_Tests_and_ Cookbook_A08.pd, 2009. {Online, accessed Feb 2015}.Google Scholar
- Maurice Herlihy and J. Eliot B. Moss. Transactional memory: Architectural support for lock-free data structures. SIGARCH Comput. Archit. News, 21(2):289–300, May 1993. Google Scholar
Digital Library
- Vilas Jagannath, Milos Gligoric, Dongyun Jin, Qingzhou Luo, Grigore Rosu, and Darko Marinov. Improved multithreaded unit testing. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering, ESEC/FSE ’11, pages 223–233, New York, NY, USA, 2011. ACM. Google Scholar
Digital Library
- Michael Kuperstein, Martin Vechev, and Eran Yahav. Automatic inference of memory fences. In Proceedings of the 2010 Conference on Formal Methods in Computer-Aided Design, FMCAD ’10, pages 111–120, Austin, TX, 2010. Google Scholar
Digital Library
- Michael Kuperstein, Martin Vechev, and Eran Yahav. Partialcoherence abstractions for relaxed memory models. In PLDI, 2011. Google Scholar
Digital Library
- Jaejin Lee and David Padua. Hiding relaxed memory consistency with a compiler. IEEE Transactions on Computers, 50(8), August 2001. Google Scholar
Digital Library
- Mohsen Lesani. On the Correctness of Transactional Memory Algorithms. PhD thesis, UCLA, 2014.Google Scholar
- Mohsen Lesani and Jens Palsberg. Decomposing opacity. In Proceedings of DISC’14, International Symposium on Distributed Computing, Austin, Texas, October 2014.Google Scholar
Cross Ref
- Feng Liu, Nayden Nedeve, Nedyalko Prisadnikov, Martin Vechev, and Eran Yahav. Dynamic synthesis for relaxed memory models. In PLDI, 2012. Google Scholar
Digital Library
- Virendra J Marathe, Michael F Spear, Christopher Heriot, Athul Acharya, David Eisenstat, William N Scherer III, and Michael L Scott. Lowering the overhead of nonblocking software transactional memory. Technical Report 893, University of Rochester, 2006.Google Scholar
- Yuri Meshman, Andrei Dan, Martin Vechev, and Eran Yahav. Synthesis of memory fences via refinement propagation. In Proceedings on Static Analysis Symposium, Lecture Notes in Computer Science Volume 8723, pages 237–252, 2014.Google Scholar
Cross Ref
- Chi Cao Minh, JaeWoong Chung, Christos Kozyrakis, and Kunle Olukotun. Stamp: Stanford transactional applications for multi-processing. In Workload Characterization, 2008. IISWC 2008. IEEE International Symposium on, pages 35– 46. IEEE, 2008.Google Scholar
- {http://stamp.stanford.edu, online, accessed Nov 2014}.Google Scholar
- University of Rochester and Lehigh University Departments of Computer Science. Rstm byteeager. https://code.google.com/p/rstm/source/browse/ trunk/libstm/algs/byteeager.cpp, 2015.Google Scholar
- {Online, accessed Feb 2015}.Google Scholar
- LLVM Project. Llvm language reference manual. http:// llvm.org/docs/LangRef.html, 2015.Google Scholar
- {Online, accessed Feb 2015}.Google Scholar
- Sage Project. Sagemath. http://www.sagemath.org/, 2015. {Online, accessed Feb 2015}.Google Scholar
- Dennis Shasha and Marc Snir. Efficient and correct execution of parallel programs that share memory. ACM Trans. Program. Lang. Syst., 10(2):282–312, April 1988. Google Scholar
Digital Library
Index Terms
Declarative fence insertion
Recommendations
Declarative fence insertion
OOPSLA 2015: Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and ApplicationsPrevious work has shown how to insert fences that enforce sequential consistency. However, for many concurrent algorithms, sequential consistency is unnecessarily strong and can lead to high execution overhead. The reason is that, often, correctness ...
Automatic fence insertion for shared memory multiprocessing
ICS '03: Proceedings of the 17th annual international conference on SupercomputingIn general, the hardware memory consistency model in a multiprocessor system is not identical to the memory model at the programming language level. Consequently, the programming language memory model must be mapped onto the hardware memory model. ...
Dynamic synthesis for relaxed memory models
PLDI '12: Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and ImplementationModern architectures implement relaxed memory models which may reorder memory operations or execute them non-atomically. Special instructions called memory fences are provided, allowing control of this behavior.
To implement a concurrent algorithm for a ...






Comments