Abstract

While hardware transactional memory (HTM) has recently been adopted to construct efficient concurrent search tree structures, such designs fail to deliver scalable performance under contention. In this paper, we first conduct a detailed analysis on an HTM-based concurrent B+Tree, which uncovers several reasons for excessive HTM aborts induced by both false and true conflicts under contention. Based on the analysis, we advocate Eunomia, a design pattern for search trees which contains several principles to reduce HTM aborts, including splitting HTM regions with version-based concurrency control to reduce HTM working sets, partitioned data layout to reduce false conflicts, proactively detecting and avoiding true conflicts, and adaptive concurrency control. To validate their effectiveness, we apply such designs to construct a scalable concurrent B+Tree using HTM. Evaluation using key-value store benchmarks on a 20-core HTM-capable multi-core machine shows that Eunomia leads to 5X-11X speedup under high contention, while incurring small overhead under low contention.
- Y. Afek, H. Avni, and N. Shavit. Towards consistency oblivious programming. In OPODIS, pages 65--79, 2011. Google Scholar
Digital Library
- M. K. Aguilera, W. Golab, and M. A. Shah. A practical scalable distributed B-Tree. VLDB, 1(1):598--609, 2008. Google Scholar
Digital Library
- R. Bayer and E. McCreight. Organization and maintenance of large ordered indexes. In Software pioneers, pages 245--262, 2002. Google Scholar
Cross Ref
- B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7):422-- 426, 1970. Google Scholar
Digital Library
- C. Blundell, E. C. Lewis, and M. M. Martin. Subtleties of transactional memory atomicity semantics. IEEE Computer Architecture Letters, 5(2), 2006. Google Scholar
Digital Library
- A. Braginsky and E. Petrank. A lock-free B+Tree. In SPAA, pages 58--67, 2012.Google Scholar
Digital Library
- T. Brown, A. Kogan, Y. Lev, and V. Luchangco. Investigating the performance of hardware transactions on a multi-socket machine. In SPAA, pages 121--132, 2016.Google Scholar
Digital Library
- H. W. Cain, M. M. Michael, B. Frey, C. May, D. Williams, and H. Le. Robust architectural support for transactional memory in the power architecture. In ISCA, pages 225--236, 2013. Google Scholar
Digital Library
- Y. Chen, X. Wei, J. Shi, R. Chen, and H. Chen. Fast and general distributed transactions using RDMA and HTM. In EuroSys, pages 26:1--26:17, 2016.Google Scholar
Digital Library
- B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking cloud serving systems with YCSB. In SOCC, pages 143--154, 2010. Google Scholar
Digital Library
- I. Corporation. Intel R 64 and ia-32 architectures software developers manual, 2015.Google Scholar
- D. Dice, Y. Lev, M. Moir, and D. Nussbaum. Early experience with a commercial hardware transactional memory implementation. In ASPLOS, pages 157--168, 2009. Google Scholar
Digital Library
- D. Dice, T. Harris, A. Kogan, and Y. Lev. The influence of malloc placement on tsx hardware transactional memory. arXiv preprint arXiv:1504.04640, 2015.Google Scholar
- J. Dittrich, L. Blunschi, and M. A. V. Salles. Dwarfs in the rearview mirror: how big are they really? VLDB, 1(2):1586-- 1597, 2008. Google Scholar
Digital Library
- G. Graefe. Modern B-Tree techniques. Found. Trends databases, pages 203--402, 2011.Google Scholar
- J. Gray, P. Sundaresan, S. Englert, K. Baclawski, and P. J. Weinberger. Quickily generating billion-record synthetic databases. In SIGMOD, volume 23, pages 243--252, 1994.Google Scholar
Digital Library
- A. Hassan, R. Palmieri, and B. Ravindran. On developing optimistic transactional lazy set. In OPODIS, pages 437--452, 2014. Google Scholar
Cross Ref
- M. Herlihy and J. E. B. Moss. Transactional memory: Architectural support for lock-free data structures. In ISCA. ACM, 1993. Google Scholar
Digital Library
- Y. Mao, E. Kohler, and R. T. Morris. Cache craftiness for fast multicore key-value storage. In Eurosys, pages 183--196, 2012. Google Scholar
Digital Library
- S. Mu, Y. Cui, Y. Zhang, W. Lloyd, and J. Li. Extracting more concurrency from distributed transactions. In OSDI, pages 479--494, 2014.Google Scholar
Digital Library
- N. Narula, C. Cutler, E. Kohler, and R. Morris. Phase reconciliation for contended in-memory transactions. In OSDI, pages 511--524, 2014.Google Scholar
Digital Library
- A. Natarajan and N. Mittal. Fast concurrent lock-free binary search trees. In PPoPP, pages 317--328, 2014. Google Scholar
Digital Library
- N. Nethercote and J. Seward. Valgrind: A framework for heavyweight dynamic binary instrumentation. In PLDI, pages 89--100, 2007. Google Scholar
Digital Library
- D. M. Powers. Applications and explanations of zipf's law. Joint conferences on new methods in language processing and computational natural language learning, pages 151-- 160, 1998.Google Scholar
Digital Library
- A. Ramachandran and N. Mittal. Improving efficacy of internal binary search trees using local recovery. In PPoPP, pages 42:1--42:2, 2016. Google Scholar
Digital Library
- S. Sen and R. E. Tarjan. Deletion without rebalancing in balanced binary trees. In SODA, pages 1490--1499, 2010. Google Scholar
Cross Ref
- J. Sewall, J. Chhugani, C. Kim, N. Satish, and P. Dubey. PALM: Parallel architecture-friendly latch-free modifications to B+Trees on many-core processors. VLDB, 4(11):795--806, 2011.Google Scholar
Digital Library
- S. Tu, W. Zheng, E. Kohler, B. Liskov, and S. Madden. Speedy transactions in multicore in-memory databases. In SOSP, pages 18--32, 2013.Google Scholar
Digital Library
- A. Wang, M. Gaudet, P. Wu, J. N. Amaral, M. Ohmacht, C. Barton, R. Silvera, and M. Michael. Evaluation of blue gene/q hardware support for transactional memories. In PACT, pages 127--136, 2012.Google Scholar
Digital Library
- Z. Wang, H. Qian, H. Chen, and J. Li. Opportunities and pitfalls of multi-core scaling using hardware transaction memory. In APSys, pages 3:1--3:7, 2013.Google Scholar
Digital Library
- Z. Wang, H. Qian, J. Li, and H. Chen. Using restricted transactional memory to build a scalable in-memory database. In EuroSys, pages 26:1--26:15, 2014. Google Scholar
Digital Library
- Z. Wang, S. Mu, H. Y. Yang Cui, H. Chen, and J. Li. Scaling multicore databases via constrained parallel execution. In SIGMOD, pages 1643--1658, 2016. Google Scholar
Digital Library
- X. Wei, J. Shi, Y. Chen, R. Chen, and H. Chen. Fast inmemory transaction processing using rdma and htm. In SOSP, pages 87--104, 2015.Google Scholar
Digital Library
- L. Xiang and M. L. Scott. Composable partitioned transactions. In Wkshp. on the Theory of Transactional Memory (WTTM), 2013.Google Scholar
- L. Xiang and M. L. Scott. Software partitioning of hardware transactions. In PPoPP, pages 76--86, 2015.Google Scholar
Digital Library
- X. Yu, G. Bezerra, A. Pavlo, S. Devadas, and M. Stonebraker.Staring into the abyss: An evaluation of concurrency control with one thousand cores. VLDB, 8(3):209--220, 2014. Google Scholar
Digital Library
Index Terms
Eunomia: Scaling Concurrent Search Trees under Contention Using HTM
Recommendations
Eunomia: Scaling Concurrent Search Trees under Contention Using HTM
PPoPP '17: Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingWhile hardware transactional memory (HTM) has recently been adopted to construct efficient concurrent search tree structures, such designs fail to deliver scalable performance under contention. In this paper, we first conduct a detailed analysis on an ...
Refined transactional lock elision
PPoPP '16Transactional lock elision (TLE) is a well-known technique that exploits hardware transactional memory (HTM) to introduce concurrency into lock-based software. It achieves that by attempting to execute a critical section protected by a lock in an atomic ...
Refined transactional lock elision
PPoPP '16: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingTransactional lock elision (TLE) is a well-known technique that exploits hardware transactional memory (HTM) to introduce concurrency into lock-based software. It achieves that by attempting to execute a critical section protected by a lock in an atomic ...







Comments