Abstract
Efficient transaction processing over large databases is a key requirement for many mission-critical applications. Although modern databases have achieved good performance through horizontal partitioning, their performance deteriorates when cross-partition distributed transactions have to be executed. This article presents SolarDB, a distributed relational database system that has been successfully tested at a large commercial bank. The key features of SolarDB include (1) a shared-everything architecture based on a two-layer log-structured merge-tree; (2) a new concurrency control algorithm that works with the log-structured storage, which ensures efficient and non-blocking transaction processing even when the storage layer is compacting data among nodes in the background; and (3) find-grained data access to effectively minimize and balance network communication within the cluster. According to our empirical evaluations on TPC-C, Smallbank, and a real-world workload, SolarDB outperforms the existing shared-nothing systems by up to 50x when there are close to or more than 5% distributed transactions.
- Alibaba Oceanbase. 2015. Oceanbase. Retrieved April 4, 2019 from https://github.com/alibaba/oceanbase.Google Scholar
- Hal Berenson, Phil Bernstein, Jim Gray, Jim Melton, Elizabeth O’Neil, and Patrick O’Neil. 1995. A critique of ANSI SQL isolation levels. In Proceedings of SIGMOD, Vol. 24. ACM, New York, NY, 1--10. Google Scholar
Digital Library
- Philip A. Bernstein, Sudipto Das, Bailu Ding, and Markus Pilman. 2015. Optimizing optimistic concurrency control for tree-structured, log-structured databases. In Proceedings of SIGMOD. 1295--1309.Google Scholar
Digital Library
- Sashikanth Chandrasekaran and Roger Bamford. 2003. Shared cache-the future of parallel databases. In Proceedings of ICDE. IEEE, Los Alamitos, CA, 840--850.Google Scholar
Cross Ref
- Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, et al. 2008. Bigtable: A distributed storage system for structured data. ACM Transactions on Computer Systems 26, 2 (2008), 4. Google Scholar
Digital Library
- Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, et al. 2007. Dynamo: Amazon’s highly available key-value store. In Proceedings of SOSP, Vol. 41. ACM, New York, NY, 205--220.Google Scholar
Digital Library
- Cristian Diaconu, Craig Freedman, Erik Ismert, Per-Ake Larson, Pravin Mittal, Ryan Stonecipher, Nitin Verma, et al. 2013. Hekaton: SQL server’s memory-optimized OLTP engine. In Proceedings of SIGMOD. ACM, New York, NY, 1243--1254.Google Scholar
Digital Library
- Aleksandar Dragojevic, Dushyanth Narayanan, Miguel Castro, and Orion Hodson. 2014. FaRM: Fast remote memory. In Proceedings of NSDI. 401--414.Google Scholar
- Aleksandar Dragojevic, Dushyanth Narayanan, Edmund B. Nightingale, Matthew Renzelmann, Alex Shamis, Anirudh Badam, and Miguel Castro. 2015. No compromises: Distributed transactions with consistency, availability, and performance. In Proceedings of SOSP. 54--70.Google Scholar
Digital Library
- Anil K. Goel, Jeffrey Pound, Nathan Auch, Peter Bumbulis, Scott MacLean, Franz Färber, Francis Gropengiesser, et al. 2015. Towards scalable real-time analytics: An architecture for scale-out of OLxP workloads. Proceedings of the VLDB Endowment 8, 12 (2015), 1716--1727. Google Scholar
Digital Library
- J. W. Josten, C. Mohan, I. Narang, and J. Z. Teng. 1997. DB2’s use of the coupling facility for data sharing. IBM Systems Journal 36, 2 (1997), 327--351. Google Scholar
Digital Library
- Robert Kallman, Hideaki Kimura, Jonathan Natkins, Andrew Pavlo, Alexander Rasin, Stanley Zdonik, Evan P. C. Jones, et al. 2008. H-store: A high-performance, distributed main memory transaction processing system. Proceedings of the VLDB Endowment 1, 2 (2008), 1496--1499. Google Scholar
Digital Library
- Alfons Kemper and Thomas Neumann. 2011. HyPer: A hybrid OLTP8OLAP main memory database system based on virtual memory snapshots. In Proceedings of ICDE. IEEE, Los Alamitos, CA, 195--206. Google Scholar
Digital Library
- Ken Kennedy and Kathryn S. McKinley. 1993. Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution. Springer. Google Scholar
Digital Library
- Hsiang-Tsung Kung and John T. Robinson. 1981. On optimistic methods for concurrency control. ACM Transactions on Database Systems 2 (1981), 213--226. Google Scholar
Digital Library
- Avinash Lakshman and Prashant Malik. 2010. Cassandra: A decentralized structured storage system. ACM SIGOPS Operating Systems Review 44, 2 (2010), 35--40. Google Scholar
Digital Library
- Justin Levandoski, David Lomet, Sudipta Sengupta, Ryan Stutsman, and Rui Wang. 2015. High performance transactions in Deuteronomy. In Proceedings of CIDR. https://www.microsoft.com/en-us/research/publication/high-performance-transactions-in-deuteronomy/.Google Scholar
- LevelDB. 2017. Home Page. Retrieved April 4, 2019 from http://leveldb.org/.Google Scholar
- Simon Loesing, Markus Pilman, Thomas Etter, and Donald Kossmann. 2015. On the design and scalability of distributed shared-data databases. In Proceedings of SIGMOD. ACM, New York, NY, 663--676. Google Scholar
Digital Library
- Shuai Mu, Yang Cui, Yang Zhang, Wyatt Lloyd, and Jinyang Li. 2014. Extracting more concurrency from distributed transactions. In Proceedings of OSDI. 479--494. Google Scholar
Digital Library
- Steven S. Muchnick. 1997. Advanced Compiler Design Implementation. Morgan Kaufmann. Google Scholar
Digital Library
- Patrick O’Neil, Edward Cheng, Dieter Gawlick, and Elizabeth O’Neil. 1996. The log-structured merge-tree (LSM-tree). Acta Informatica 33, 4 (1996), 351--385.Google Scholar
Digital Library
- John Ousterhout, Parag Agrawal, David Erickson, Christos Kozyrakis, Jacob Leverich, David Mazières, Subhasish Mitra, et al. 2010. The case for RAMClouds: Scalable high-performance storage entirely in DRAM. ACM SIGOPS Operating Systems Review 43, 4 (2010), 92--105. Google Scholar
Digital Library
- Wolf Rödiger, Tobias Mühlbauer, Alfons Kemper, and Thomas Neumann. 2015. High-speed query processing over high-speed networks. Proceedings of the VLDB Endowment 9, 4 (2015), 228--239. Google Scholar
Digital Library
- Marco Serafini, Essam Mansour, Ashraf Aboulnaga, Kenneth Salem, Taha Rafiq, and Umar Farooq Minhas. 2014. Accordion: Elastic scalability for database systems supporting distributed transactions. Proceedings of the VLDB Endowment 7, 12 (2014), 1035--1046. Google Scholar
Digital Library
- Michael Stonebraker, Samuel Madden, Daniel J. Abadi, Stavros Harizopoulos, Nabil Hachem, and Pat Helland. 2007. The end of an architectural era: (It’s time for a complete rewrite). In Proceedings of VLDB. 1150--1160.Google Scholar
- Michael Stonebraker and Ariel Weisberg. 2013. The VoltDB main memory DBMS. IEEE Data Engineering Bulletin 36, 2 (2013), 21--27.Google Scholar
- Rebecca Taft, Essam Mansour, Marco Serafini, Jennie Duggan, Aaron J. Elmore, Ashraf Aboulnaga, Andrew Pavlo, et al. 2014. E-store: Fine-grained elastic partitioning for distributed transaction processing systems. In Proceedings of VLDB. 245--256. Google Scholar
Digital Library
- Alexander Thomson, Thaddeus Diamond, Shu-Chun Weng, Kun Ren, Philip Shao, and Daniel J. Abadi. 2012. Calvin: Fast distributed transactions for partitioned database systems. In Proceedings of SIGMOD. 1--12. Google Scholar
Digital Library
- Stephen Tu, Wenting Zheng, Eddie Kohler, Barbara Liskov, and Samuel Madden. 2013. Speedy transactions in multicore in-memory databases. In Proceedings of SOSP. 18--32.Google Scholar
Digital Library
- Hoang Tam Vo, Sheng Wang, Divyakant Agrawal, Gang Chen, and Beng Chin Ooi. 2012. LogBase: A scalable log-structured database system in the cloud. Proceedings of the VLDB Endowment 5, 10 (2012), 1004--1015.Google Scholar
Digital Library
- VoltDB Inc. 2017. VoltDB. Retrieved April 4, 2019 from https://www.voltdb.com/.Google Scholar
- Gottfried Vossen. 1995. Database transaction models. In Computer Science Today. Springer, 560--574.Google Scholar
- Zhaoguo Wang, Shuai Mu, Yang Cui, Han Yi, Haibo Chen, and Jinyang Li. 2016. Scaling multicore databases via constrained parallel execution. In Proceedings of SIGMOD. ACM, New York, NY, 1643--1658. Google Scholar
Digital Library
- Zhaoguo Wang, Hao Qian, Jinyang Li, and Haibo Chen. 2014. Using restricted transactional memory to build a scalable in-memory database. In Proceedings of EuroSys. 26:1--26:15. Google Scholar
Digital Library
- Michael Wei, Amy Tai, Christopher J. Rossbach, Ittai Abraham, Maithem Munshed, Medhavi Dhawan, Jim Stabile, et al. 2017. vCorfu: A cloud-scale object store on a shared log. In Proceedings of USENIX NSDI. 35--49. Google Scholar
Digital Library
- Xingda Wei, Sijie Shen, Rong Chen, and Haibo Chen. 2017. Replication-driven live reconfiguration for fast distributed transaction processing. In Proceedings of USENIX ATC. 335--347. Google Scholar
Digital Library
- Xingda Wei, Jiaxin Shi, Yanzhe Chen, Rong Chen, and Haibo Chen. 2015. Fast in-memory transaction processing using RDMA and HTM. In Proceedings of SOSP. ACM, New York, NY, 87--104. Google Scholar
Digital Library
- Brian White, Jay Lepreau, Leigh Stoller, Robert Ricci, Shashi Guruprasad, Mac Newbold, Mike Hibler, et al. 2002. An integrated experimental environment for distributed systems and networks. In Proceedings of OSDI. 255--270. Google Scholar
Digital Library
- Yingjun Wu, Chee-Yong Chan, and Kian-Lee Tan. 2016. Transaction healing: Scaling optimistic concurrency control on multicores. In Proceedings of SIGMOD. ACM, New York, NY, 1689--1704. Google Scholar
Digital Library
- Cong Yan and Alvin Cheung. 2016. Leveraging lock contention to improve OLTP application performance. In Proceedings of VLDB. 444--455. Google Scholar
Digital Library
- Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster computing with working sets. In Proceedings of USENIX HotCloud. Google Scholar
Digital Library
Index Terms
SolarDB: Toward a Shared-Everything Database on Distributed Log-Structured Storage
Recommendations
Atomic recovery units: failure atomicity for logical disks
ICDCS '96: Proceedings of the 16th International Conference on Distributed Computing Systems (ICDCS '96)Atomic recovery units (ARUs) are a mechanism that allows several logical disk operations to be executed as a single atomic unit with respect to failures. For example, ARUs can be used during file creation to update several pieces of file meta-data ...
Implementing Distributed Read-Only Transactions
This paper presents an efficient scheme for eliminating conflicts between distributed read-only transactions and distributed update transactions, thereby reducing synchronization delays. The scheme makes use of a multiversion mechanism in order to ...
FASTER: A Concurrent Key-Value Store with In-Place Updates
SIGMOD '18: Proceedings of the 2018 International Conference on Management of DataOver the last decade, there has been a tremendous growth in data-intensive applications and services in the cloud. Data is created on a variety of edge sources, e.g., devices, browsers, and servers, and processed by cloud applications to gain insights ...






Comments