Abstract
Data-structures can benefit from dynamic data layout modifications when the size or the shape of the data structure changes during the execution, or when different phases in the program execute different workloads. However, in a modern multi-core environment, layout modifications involve costly synchronization overhead. In this paper we propose a novel layout lock that incurs a negligible overhead for reads and a small overhead for updates of the data structure. We then demonstrate the benefits of layout changes and also the advantages of the layout lock as its supporting synchronization mechanism for two data structures. In particular, we propose a concurrent binary search tree, and a concurrent array set, that benefit from concurrent layout modifications using the proposed layout lock. Experience demonstrates performance advantages and integration simplicity.
- D. Alistarh, W. M. Leiserson, A. Matveev, and N. Shavit. ThreadScan. In Proceedings of the 27th ACM on Symposium on Parallelism in Algorithms and Architectures - SPAA '15, pages 123--132. ACM Press, jun 2015.Google Scholar
- A. Arcangeli, M. Cao, P. E. McKenney, and D. Sarma. Using Read-Copy-Update Techniques for System V IPC in the Linux 2.5 Kernel. In USENIX Annual Technical Conference, FREENIX Track, pages 297--309, 2003. H. Boehm. Can seqlocks get along with programming language memory models? Proceedings of the 2012 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness - MSPC'12, pages 12 -- 20, 2012.Google Scholar
- A. Braginsky and E. Petrank. A lock-free B+tree. In Proceedinbgs of the 24th ACM symposium on Parallelism in algorithms and architectures - SPAA '12, page 58, New York, New York, USA, jun 2012. ACM Press.Google Scholar
Digital Library
- B. Brandenburg and J. Anderson. Spin-based reader-writer synchronization for multiprocessor real-time systems. RealTime Systems, 2010. Google Scholar
Digital Library
- N. G. Bronson, J. Casper, H. Chafi, and K. Olukotun. A practical concurrent binary search tree. Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming - PPoPP'10, pages 257--268, may 2010. ISSN 03621340.Google Scholar
Digital Library
- T. A. Brown. Reclaiming Memory for Lock-Free Data Structures. In Proceedings of the 2015 ACM Symposium on Principles of Distributed Computing - PODC '15, pages 261--270. ACM Press, jul 2015. Google Scholar
Digital Library
- I. Calciu, D. Dice, Y. Lev, V. Luchangco, V. Marathe, and N. Shavit. NUMA-aware reader-writer locks. Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP'13, pages 157--166, 2013.Google Scholar
Digital Library
- G. Chakrabarti and F. Chow. Structure Layout Optimizations in the Open64 Compiler : Design , Implementation and Measurements. Open64 workshop, 2008.Google Scholar
- N. Cohen and E. Petrank. Efficient Memory Management for LockFree Data Structures with Optimistic Access. In Proceedings of the 27th ACM on Symposium on Parallelism in Algorithms and Architectures - SPAA '15, pages 254--263, jun 2015a.Google Scholar
Digital Library
- N. Cohen and E. Petrank. Automatic memory reclamation for lockfree data structures. Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications - OOPLSA'15, pages 260--279, oct 2015b.Google Scholar
Digital Library
- T. David, R. Guerraoui, and V. Trigonakis. Asynchronized Concurrency. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS'15, pages 631--644, mar 2015. Google Scholar
Digital Library
- M. Desnoyers, P. E. McKenney, A. S. Stern, M. R. Dagenais, and J. Walpole. User-Level Implementations of Read-Copy Update. IEEE Transactions on Parallel and Distributed Systems, 23(2): 375--382, feb 2012. Google Scholar
Digital Library
- D. Dice, M. Herlihy, and A. Kogan. Fast non-intrusive memory reclamation for highly-concurrent data structures. In Proceedings of the 2016 ACM SIGPLAN International Symposium on Memory Management - ISMM 2016, pages 36--45, New York, New York, USA, 2016. ACM Press. ISBN 9781450343176. doi: 10.1145/2926697.2926699. Google Scholar
Digital Library
- C. Ding and K. Kennedy. Improving cache performance in dynamic applications through data and computation reorganization at run time. Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation - PLDI'99, 1999. Google Scholar
Digital Library
- D. Drachsler, M. Vechev, and E. Yahav. Practical concurrent binary search trees via logical ordering. In Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '14, pages 343--356. ACM Press, feb 2014. Google Scholar
Digital Library
- A. Eizenberg, S. Hu, G. Pokam, and J. Devietti. Remix: online detection and repair of cache contention for the JVM. Proceedings of the 37th ACM, 2016. Google Scholar
Digital Library
- F. Ellen, P. Fatourou, E. Ruppert, and F. van Breugel. Non-blocking binary search trees. In Proceeding of the 29th ACM SIGACTSIGOPS symposium on Principles of distributed computing -PODC '10, page 131. ACM Press, jul 2010. Google Scholar
Digital Library
- V. Gramoli. More than you ever wanted to know about synchronization. PPoPP, Feb, 2015. URL http://ssrg.nicta.com.au/publications/nictaabstracts/8487.pdf.Google Scholar
- W. Hsieh and W. Weihl. Scalable reader-writer locks for parallel systems. Proceedings of the Sixth International Parallel Processing Symposium, 1992. Google Scholar
Digital Library
- M. Kandemir, A. Choudhary, J. Ramanujam, and P. Banerjee. A framework for interprocedural locality optimization using both loop and data layout transformations. In Proceedings of the 1999 International Conference on Parallel Processing, pages 95--102. IEEE Comput. Soc, 1999a. Google Scholar
Cross Ref
- M. Kandemir, J. Ramanujam, and A. Choudhary. Improving cache locality by a combination of loop and data transformations. IEEE Transactions on Computers, 48(2):159--167, 1999b. Google Scholar
Digital Library
- O. Kennedy and L. Ziarek. Just-In-Time Data Structures. Proceedings. of the 7th Biennial Conference on Innovative Data Systems Research - CIDR'15, 2015.Google Scholar
- C. Lameter. Effective synchronization on Linux/NUMA systems. Gelato Conference, pages 1--23, 2005. URL http://www.kde.ps.pl/mirrors/ftp.kernel.org/linux/kernel/people/christoph/gelato/gelato2005-paper.pdf.Google Scholar
- D. Lea and JSR-166. StampedLock. URL http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8u40-b25/java/util/concurrent/locks/StampedLock.java.Google Scholar
- Y. Lev, V. Luchangco, and M. Olszewski. Scalable reader-writer locks. Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures - SPAA'09, pages 101--110, 2009. Google Scholar
Digital Library
- Y. Li, Y. Tan, W. Wang, Q. Zhang, and Z. Wang. A Cacheconscious Structure Definition for List. Journal of Applied Sciences, 13(8):1192--1198, 2013. Google Scholar
Cross Ref
- Y. Lin, K. Wang, S. Blackburn, and A. Hosking. Stop and go: understanding yieldpoint behavior. Proceedings of the 2015 International Symposium on Memory Management - ISMM'15, pages 70 -- 80, 2015. Google Scholar
Digital Library
- R. Liu, H. Zhang, and H. Chen. Scalable read-mostly synchronization using passive reader-writer locks. Proceedings of the 2014 USENIX Annual Technical Conference - USENIX ATC'14, pages 219--230, 2014.Google Scholar
Digital Library
- Q. Lu, X. Gao, and S. Krishnamoorthy. Empirical performancemodel driven data layout optimization. 17th International Workshop on Languages and Compilers for High Performance Computing, LCPC'04, pages 72--86, 2004.Google Scholar
Digital Library
- S. Mannarswamy. Region based structure layout optimization by selective data copying. 18th International Conference on Parallel Architectures and Compilation Techniques - PACT '09, pages 338--347, 2009.Google Scholar
Digital Library
- P. E. McKenney, M. Desnoyers, and L. Jiangshan. User-space RCU. URL https://lwn.net/Articles/573424/.Google Scholar
- J. Mellor-Crummey and M. Scott. Scalable reader-writer synchronization for shared-memory multiprocessors. Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP'91, pages 106--113, 1991. Google Scholar
Digital Library
- A. Morrison and M. Arbel. Predicate RCU : An RCU for Scalable Concurrent Updates. In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming - PPoPP'15, pages 21--30, 2015.Google Scholar
- A. Natarajan and N. Mittal. Fast concurrent lock-free binary search trees. Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming - PPoPP'14, pages 317--328, 2014. doi: 10.1145/2555243.2555256. Google Scholar
Digital Library
- E. Raman, R. Hundt, and S. Mannarswamy. Structure layout optimization for multithreaded programs. In International Symposium on Code Generation and Optimization, CGO 2007, pages 271--282. IEEE, mar 2007. ISBN 0769527647. doi: 10.1109/CGO.2007.36. Google Scholar
Digital Library
- C. A. N. Soules, J. Appavoo, K. Hui, R. W. Wisniewski, D. Da Silva, G. R. Ganger, O. Krieger, M. Stumm, M. Auslander, M. Ostrowski, B. Rosenburg, and J. Xenidis. System Support for Online Reconfiguration. In USENIX Annual Technical Conference. Proceedings of the 2003 Conference on, pages 141--154, 2003. ISBN 1--931971--10--2.Google Scholar
- I. Sung, J. Stratton, and W. Hwu. Data layout transformation exploiting memory-level parallelism in structured grid manycore applications. Proceedings of the 19th international conference on Parallel architectures and compilation techniques - PACT'10, pages 513--522, 2010.Google Scholar
Digital Library
- J. Triplett, P. E. McKenney, and J. Walpole. Scalable concurrent hash tables via relativistic programming. ACM SIGOPS Operating Systems Review, 44(3):102, 2010. ISSN 01635980. doi: 10.1145/1842733.1842750. Google Scholar
Digital Library
- M. D. Wael. Just-in-time data structures: towards declarative swap rules. Proceedings of the 13th International Workshop on Dynamic Analysis - WODA'15, pages 33--34, 2015.Google Scholar
Digital Library
- M. D. Wael, S. Marr, and J. D. Koster. Just-in-time data structures. 2015 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software - Onward!'15, pages 61--75, 2015.Google Scholar
Digital Library
- G. Xu. CoCo: Sound and adaptive replacement of java collections. 27th European Conference Object-Oriented Programming - ECOOP'13, pages 1--26, 2013.Google Scholar
Index Terms
Layout Lock: A Scalable Locking Paradigm for Concurrent Data Layout Modifications
Recommendations
Layout Lock: A Scalable Locking Paradigm for Concurrent Data Layout Modifications
PPoPP '17: Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingData-structures can benefit from dynamic data layout modifications when the size or the shape of the data structure changes during the execution, or when different phases in the program execute different workloads. However, in a modern multi-core ...
Transactional Lock Elision Meets Combining
PODC '17: Proceedings of the ACM Symposium on Principles of Distributed ComputingFlat combining (FC) and transactional lock elision (TLE) are two techniques that facilitate efficient multi-thread access to a sequentially implemented data structure protected by a lock. FC allows threads to delegate their operations to another (...
Lock-Free Data-Structure Iterators
DISC 2013: Proceedings of the 27th International Symposium on Distributed Computing - Volume 8205Concurrent data structures are often used with large concurrent software. An iterator that traverses the data structure items is a highly desirable interface that often exists for sequential data structures but is missing from almost all concurrent data-...







Comments