skip to main content
research-article

SQLite: past, present, and future

Published: 01 August 2022 Publication History

Abstract

In the two decades following its initial release, SQLite has become the most widely deployed database engine in existence. Today, SQLite is found in nearly every smartphone, computer, web browser, television, and automobile. Several factors are likely responsible for its ubiquity, including its in-process design, standalone codebase, extensive test suite, and cross-platform file format. While it supports complex analytical queries, SQLite is primarily designed for fast online transaction processing (OLTP), employing row-oriented execution and a B-tree storage format. However, fueled by the rise of edge computing and data science, there is a growing need for efficient in-process online analytical processing (OLAP). DuckDB, a database engine nicknamed "the SQLite for analytics", has recently emerged to meet this demand. While DuckDB has shown strong performance on OLAP benchmarks, it is unclear how SQLite compares. Furthermore, we are aware of no work that attempts to identify root causes for SQLite's performance behavior on OLAP workloads. In this paper, we discuss SQLite in the context of this changing workload landscape. We describe how SQLite evolved from its humble beginnings to the full-featured database engine it is today. We evaluate the performance of modern SQLite on three benchmarks, each representing a different flavor of in-process data management, including transactional, analytical, and blob processing. We delve into analytical data processing on SQLite, identifying key bottlenecks and weighing potential solutions. As a result of our optimizations, SQLite is now up to 4.2X faster on SSB. Finally, we discuss the future of SQLite, envisioning how it will evolve to meet new demands and challenges.

References

[1]
2021. Recommended Formats Statement. Technical Report. United States Library of Congress.
[2]
R. Barber, G. Lohman, I. Pandis, V. Raman, R. Sidle, G. Attaluri, N. Chainani, S. Lightstone, and D. Sharpe. 2014. Memory-Efficient Hash Joins. In Proceedings of the VLDB Endowment (PVLDB), Vol. 8. 353--364.
[3]
Burton H. Bloom. 1970. Space/Time Trade-Offs in Hash Coding with Allowable Errors. Commun. ACM 13, 7 (1970), 422--426.
[4]
Ming-Syan Chen, Hui-I Hsiao, and Philip S. Yu. 1993. Applying Hash Filters to Improving the Execution of Bushy Trees. In 19th International Conference on Very Large Databases. 506--516.
[5]
Ming-Syan Chen, Hui-I Hsiao, and Philip S. Yu. 1997. On applying hash filters to improving the execution of multi-join queries. The VLDB Journal 6, 2 (1997), 121--131.
[6]
Ming-Syan Chen and Philip S. Yu. 1992. Interleaving a join sequence with semijoins in distributed query processing. In IEEE Transactions on Parallel and Distributed Systems, Vol. 3. 611--621.
[7]
Bailu Ding, Surajit Chaudhuri, and Vivek Narasayya. 2020. Bitvector-Aware Query Optimization for Decision Support Queries. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (SIGMOD '20). 2011--2026.
[8]
DuckDB. [n.d.]. Continuous Benchmarking. https://duckdb.org/benchmarks/.
[9]
Dmitry Duplyakin, Robert Ricci, Aleksander Maricq, Gary Wong, Jonathon Duerig, Eric Eide, Leigh Stoller, Mike Hibler, David Johnson, Kirk Webb, Aditya Akella, Kuangching Wang, Glenn Ricart, Larry Landweber, Chip Elliott, Michael Zink, Emmanuel Cecchet, Snigdhaswin Kar, and Prabodh Mishra. 2019. The Design and Operation of CloudLab. In Proceedings of the USENIX Annual Technical Conference (ATC). 1--14. https://www.flux.utah.edu/paper/duplyakin-atc19
[10]
William Endress. 2013. On-line Analytic Processing with Oracle Database 12c. Technical Report. Oracle Corporation. https://www.oracle.com/technetwork/database/options/olap/olap-wp-12c-1896136.pdf
[11]
Franz Färber, Sang Kyun Cha, Jürgen Primsch, Christof Bornhövd, Stefan Sigg, and Wolfgang Lehner. 2011. SAP HANA Database - Data Management for Modern Business Applications. In SIGMOD Record, Vol. 40. 45--51.
[12]
Raspberry Pi Foundation. [n.d.]. About us. https://www.raspberrypi.org/about/.
[13]
Raspberry Pi Foundation. [n.d.]. Raspberry Pi 4. https://www.raspberrypi.com/products/raspberry-pi-4-model-b/.
[14]
Raspberry Pi Foundation. [n.d.]. Raspberry Pi 4 Tech Specs. https://www.raspberrypi.com/products/raspberry-pi-4-model-b/specifications/.
[15]
Gerhard Häring. [n.d.]. sqlite3 --- DB-API 2.0 interface for SQLite databases. https://docs.python.org/3.10/library/sqlite3.html.
[16]
D. Richard Hipp. [n.d.]. 35% Faster Than The Filesystem. https://www.sqlite.org/fasterthanfs.html.
[17]
D. Richard Hipp. [n.d.]. About SQLite. https://www.sqlite.org/about.html.
[18]
D. Richard Hipp. [n.d.]. The Advantages Of Flexible Typing. https://www.sqlite.org/flextypegood.html.
[19]
D. Richard Hipp. [n.d.]. Appropriate Uses For SQLite. https://www.sqlite.org/whentouse.html.
[20]
D. Richard Hipp. [n.d.]. Architecture of SQLite. https://www.sqlite.org/arch.html.
[21]
D. Richard Hipp. [n.d.]. Atomic Commit In SQLite. https://sqlite.org/atomiccommit.html.
[22]
D. Richard Hipp. [n.d.]. Compile-time Options. https://www.sqlite.org/compile.html.
[23]
D. Richard Hipp. [n.d.]. Database File Format. https://www.sqlite.org/fileformat.html.
[24]
D. Richard Hipp. [n.d.]. Datatypes In SQLite. https://www.sqlite.org/datatype3.html.
[25]
D. Richard Hipp. [n.d.]. How SQLite Is Tested. https://www.sqlite.org/test.html.
[26]
D. Richard Hipp. [n.d.]. Long Term Support. https://sqlite.org/lts.html.
[27]
D. Richard Hipp. [n.d.]. Most Widely Deployed and Used Database Engine. https://www.sqlite.org/mostdeployed.html.
[28]
D. Richard Hipp. [n.d.]. The Next-Generation Query Planner. https://www.sqlite.org/queryplanner-ng.html.
[29]
D. Richard Hipp. [n.d.]. Release History. https://www.sqlite.org/changes.html.
[30]
D. Richard Hipp. [n.d.]. Single-file Cross-platform Database. https://www.sqlite.org/onefile.html.
[31]
D. Richard Hipp. [n.d.]. The SQLite Amalgamation. https://www.sqlite.org/amalgamation.html.
[32]
D. Richard Hipp. [n.d.]. The SQLite Bytecode Engine. https://www.sqlite.org/opcode.html.
[33]
D. Richard Hipp. [n.d.]. SQLite Is Serverless. https://www.sqlite.org/serverless.html.
[34]
D. Richard Hipp. [n.d.]. SQLite is Transactional. https://www.sqlite.org/transactional.html.
[35]
D. Richard Hipp. [n.d.]. TH3. https://sqlite.org/th3.html.
[36]
D. Richard Hipp. [n.d.]. Well-Known Users of SQLite. https://www.sqlite.org/famous.html.
[37]
Stratos Idreos, Fabian Groffen, Niels Nes, Stefan Manegold, Sjoerd Mullender, and Martin Kersten. 2012. MonetDB: Two Decades of Research in Column-oriented Database Architectures. In Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, Vol. 35. 40--45.
[38]
Kaggle. [n.d.]. Datasets. https://www.kaggle.com/docs/datasets.
[39]
Oliver Kennedy, Jerry Ajay, Geoffrey Challen, and Lukasz Ziarek. 2015. Pocket Data: The Need for TPC-MOBILE. In Technology Conference on Performance Evaluation and Benchmarking (TPCTC).
[40]
Motorola Microprocessor and Memory Technologies Group. [n.d.]. Product Brief Integrated Portable System Processor - DragonBall (TM). https://www.nxp.com/docs/en/product-brief/MC68328P.pdf.
[41]
Michael Mitzenmacher and Eli Upfal. 2005. Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press.
[42]
Simo Neuvonen, Antoni Wolski, Markku Manner, and Vilho Raatikka. 2011. Telecom Application Transaction Processing Benchmark. http://tatpbenchmark.sourceforge.net
[43]
Pat O'Neil, Betty O'Neil, and Xuedong Chen. 2009. Star Schema Benchmark. https://www.cs.umb.edu/~poneil/StarSchemaB.PDF
[44]
Jignesh M. Patel, Harshad Deshmukh, Jianqiao Zhu, Navneet Potti, Zuyu Zhang, Marc Spehlmann, Hakan Memisoglu, and Saket Saurabh. 2018. Quickstep: A Data Platform Based on the Scaling-up Approach. In Proceedings of the VLDB Endowment (PVLDB). 663--676.
[45]
CoRecursive Podcast. [n.d.]. 066: The Untold Story of SQLite. https://corecursive.com/066-sqlite-with-richard-hipp/.
[46]
Martin Prammer, Suryadev Sahadevan Rajesh, Junda Chen, and Jignesh M. Patel. 2022. Introducing a Query Acceleration Path for Analytics in SQLite3. In 12th Annual Conference on Innovative Data Systems Research (CIDR '22).
[47]
Mark Raasveldt and Hannes Mühleisen. 2019. DuckDB: An Embeddable Analytical Database. In Proceedings of the 2019 International Conference on Management of Data (Amsterdam, Netherlands) (SIGMOD '19). Association for Computing Machinery, New York, NY, USA, 1981--1984.
[48]
Weisong Shi, Jie Cao, Quan Zhang, Youhuizi Li, and Lanyu Xu. 2016. Edge Computing: Vision and Challenges. IEEE Internet of Things Journal 3, 5 (2016), 637--646.
[49]
Transaction Processing Performance Council (TPC). 2010. TPC Benchmark C. http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-c_v5.11.0.pdf
[50]
Transaction Processing Performance Council (TPC). 2015. TPC Benchmark E. http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-e_v1.14.0.pdf
[51]
Transaction Processing Performance Council (TPC). 2021. TPC Benchmark H. http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-h_v3.0.0.pdf
[52]
Wes McKinney. 2010. Data Structures for Statistical Computing in Python. In Proceedings of the 9th Python in Science Conference, Stéfan van der Walt and Jarrod Millman (Eds.). 56 -- 61.
[53]
Marianne Winslett and Vanessa Braganholo. 2019. Richard Hipp Speaks Out on SQLite. In ACM SIGMOD Record, Vol. 48. 39--46. Issue 2.
[54]
Jianqiao Zhu, Navneet Potti, Saket Saurabh, and Jignesh M. Patel. 2017. Looking Ahead Makes Query Plans Robust. In Proceedings of the VLDB Endowment (PVLDB), Vol. 10. 889--900.

Cited By

View all
  • (2024)Optimizing Collections of Bloom Filters within a Space BudgetProceedings of the VLDB Endowment10.14778/3681954.368202017:11(3551-3564)Online publication date: 1-Jul-2024
  • (2024)Bf-Tree: A Modern Read-Write-Optimized Concurrent Larger-Than-Memory Range IndexProceedings of the VLDB Endowment10.14778/3681954.368201217:11(3442-3455)Online publication date: 1-Jul-2024
  • (2024)ZWAL: Rethinking Write-ahead Logs for ZNS SSDs with Zone AppendsProceedings of the 4th Workshop on Challenges and Opportunities of Efficient and Performant Storage Systems10.1145/3642963.3652203(9-16)Online publication date: 22-Apr-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 15, Issue 12
August 2022
551 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 August 2022
Published in PVLDB Volume 15, Issue 12

Badges

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)242
  • Downloads (Last 6 weeks)28
Reflects downloads up to 21 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Optimizing Collections of Bloom Filters within a Space BudgetProceedings of the VLDB Endowment10.14778/3681954.368202017:11(3551-3564)Online publication date: 1-Jul-2024
  • (2024)Bf-Tree: A Modern Read-Write-Optimized Concurrent Larger-Than-Memory Range IndexProceedings of the VLDB Endowment10.14778/3681954.368201217:11(3442-3455)Online publication date: 1-Jul-2024
  • (2024)ZWAL: Rethinking Write-ahead Logs for ZNS SSDs with Zone AppendsProceedings of the 4th Workshop on Challenges and Opportunities of Efficient and Performant Storage Systems10.1145/3642963.3652203(9-16)Online publication date: 22-Apr-2024
  • (2024)μWheel: Aggregate Management for Streams and QueriesProceedings of the 18th ACM International Conference on Distributed and Event-based Systems10.1145/3629104.3666031(54-65)Online publication date: 24-Jun-2024
  • (2024)MemSnap μCheckpoints: A Data Single Level Store for Fearless PersistenceProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3620666.3651334(622-638)Online publication date: 27-Apr-2024
  • (2023)Simple Adaptive Query Processing vs. Learned Query Optimizers: Observations and AnalysisProceedings of the VLDB Endowment10.14778/3611479.361150116:11(2962-2975)Online publication date: 24-Aug-2023
  • (2023)LUNAR: A Native Table Engine for Embedded DevicesProceedings of the 24th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3589610.3596276(122-133)Online publication date: 13-Jun-2023
  • (2023)Software Resurrection: Discovering Programming Pearls by Showing Modernity to Historical SoftwareProceedings of the 45th International Conference on Software Engineering: Software Engineering Education and Training10.1109/ICSE-SEET58685.2023.00016(101-106)Online publication date: 17-May-2023
  • (2023)Fine-Grained Tuple Transfer for Pipelined Query Execution on CPU-GPU CoprocessorDatabase Systems for Advanced Applications10.1007/978-3-031-30637-2_2(19-34)Online publication date: 17-Apr-2023

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media