ABSTRACT
Join order selection plays an important role in DBMS query optimizers. The problem aims to find the optimal join order with the minimum cost, and usually becomes an NP-hard problem due to the exponentially increasing search space. Recent advanced studies attempt to use deep reinforcement learning (DRL) to generate better join plans than the ones provided by conventional query optimizers. However, DRL-based methods require time-consuming training, which is not suitable for online applications that need frequent periodic re-training. In this paper, we propose a novel framework, namely efficient Join Order selection learninG with Graph-basEd Representation (JOGGER). We firstly construct a schema graph based on the primary-foreign key relationships, from which table representations are well learned to capture the correlations between tables. The second component is the state representation, where a graph convolutional network is utilized to encode the query graph and a tailored-tree-based attention module is designed to encode the join plan. To speed up the convergence of DRL training process, we exploit the idea of curriculum learning, in which queries are incrementally added into the training set according to the level of difficulties. We conduct extensive experiments on JOB and TPC-H datasets, which demonstrate the effectiveness and efficiency of the proposed solutions.
Supplemental Material
References
- Mahtab Ahmed, Muhammad Rifayat Samee, and Robert E Mercer. 2019. Improving Tree-LSTM with Tree Attention. In 2019 IEEE 13th International Conference on Semantic Computing (ICSC). IEEE Computer Society, 247--254.Google Scholar
- Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. 2009. Curriculum learning. In Proceedings of the Annual International Conference on Machine Learning. 41--48.Google Scholar
Digital Library
- Swati V Chande and Madhavi Sinha. 2011. Genetic optimization for the join ordering problem of database queries. In India International Conference. 1--5.Google Scholar
Cross Ref
- Leonidas Fegaras. 1998. A new heuristic for optimizing large queries. In International Conference on Database and Expert Systems Applications. 726--735.Google Scholar
Cross Ref
- Carlos Florensa, David Held, Markus Wulfmeier, Michael Zhang, and Pieter Abbeel. 2017. Reverse curriculum generation for reinforcement learning. In Conference on Robot Learning. 482--495.Google Scholar
- Jonas Heitz and Kurt Stockinger. 2019. Join query optimization with deep reinforcement learning algorithms. arXiv preprint arXiv:1911.11689 (2019).Google Scholar
- Yannis E Ioannidis and Younkyung Cha Kang. 1991. Left-deep vs. bushy trees: An analysis of strategy spaces and its implications for query optimization. In Proceedings of the SIGMOD International Vonference on Management of Data. 168--177.Google Scholar
Digital Library
- Lu Jiang, Deyu Meng, Shoou-I Yu, Zhenzhong Lan, Shiguang Shan, and Alexander Hauptmann. 2014. Self-paced learning with diversity. Advances in Neural Information Processing Systems 27 (2014), 2078--2086.Google Scholar
- Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In International Conference on Learning Representations.Google Scholar
- Thomas Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. ArXiv abs/1609.02907 (2017).Google Scholar
- Sanjay Krishnan, Zongheng Yang, Ken Goldberg, Joseph M. Hellerstein, and Ion Stoica. 2018. Learning to Optimize Join Queries With Deep Reinforcement Learning. CoRR abs/1808.03196 (2018).Google Scholar
- Phong Le and Willem H. Zuidema. 2015. Compositional Distributional Semantics with Long Short Term Memory. In Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics. 10--19.Google Scholar
- Kyeong-Min Lee, InA Kim, and Kyu-Chul Lee. 2020. DQN-based Join Order Optimization by Learning Experiences of Running Queries on Spark SQL. In International Conference on Data Mining Workshops. 740--742.Google Scholar
Cross Ref
- Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter Boncz, Alfons Kemper, and Thomas Neumann. 2015. How good are query optimizers, really? Proceedings of the Very Large Data Base Endowment 9, 3 (2015), 204--215.Google Scholar
Digital Library
- Jie Liu, Wenqian Dong, Dong Li, and Qingqing Zhou. 2021. Fauce: Fast and Accurate Deep Ensembles with Uncertainty for Cardinality Estimation. Proceedings of the Very Large Data Base Endowment 14, 11 (2021), 1950--1963.Google Scholar
Digital Library
- Ryan Marcus, Parimarjan Negi, Hongzi Mao, Chi Zhang, Mohammad Alizadeh, Tim Kraska, Olga Papaemmanouil, and Nesime Tatbul. 2019. Neo: a learned query optimizer. Proceedings of the Very Large Data Base Endowment 12, 11 (2019), 1705--1718.Google Scholar
Digital Library
- Ryan Marcus and Olga Papaemmanouil. 2018. Deep reinforcement learning for join order enumeration. In Proceedings of the International Workshop on Exploiting Artificial Intelligence Techniques for Data Management. 1--4.Google Scholar
Digital Library
- Tomas Mikolov, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. In International Conference on Learning Representations.Google Scholar
- Lili Mou, Ge Li, Lu Zhang, Tao Wang, and Zhi Jin. 2016. Convolutional neural networks over tree structures for programming language processing. In Thirtieth AAAI Conference on Artificial Intelligence.Google Scholar
Digital Library
- Sanmit Narvekar, Jivko Sinapov, and Peter Stone. 2017. Autonomous Task Sequencing for Customized Curriculum Design in Reinforcement Learning.. In International Joint Conference on Artificial Intelligence. 2536--2542.Google Scholar
Cross Ref
- Xuan-Phi Nguyen, Shafiq Joty, Steven Hoi, and Richard Socher. 2020. TreeStructured Attention with Hierarchical Accumulation. In International Conference on Learning Representations.Google Scholar
- Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. DeepWalk: online learning of social representations. In International Conference on Knowledge Discovery and Data Mining. 701--710.Google Scholar
Digital Library
- Meikel Poess and Chris Floyd. 2000. New TPC benchmarks for decision support and web commerce. Sigmod Record 29, 4 (2000), 64--71.Google Scholar
Digital Library
- Zhipeng Ren, Daoyi Dong, Huaxiong Li, and Chunlin Chen. 2018. Self-paced prioritized curriculum learning with coverage penalty in deep reinforcement learning. Transactions on Neural Networks and Learning Systems 29, 6 (2018), 2216--2226.Google Scholar
Cross Ref
- Tom Schaul, John Quan, Ioannis Antonoglou, and David Silver. 2015. Prioritized experience replay. arXiv preprint arXiv:1511.05952 (2015).Google Scholar
- John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).Google Scholar
- Kai Sheng Tai, Richard Socher, and Christopher D Manning. 2015. Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks. In Proceedings of the Annual Meeting of the Association for Computational Linguistics and the International Joint Conference on Natural Language Processing. 1556--1566.Google Scholar
Cross Ref
- Immanuel Trummer, Junxiong Wang, Deepak Maram, Samuel Moseley, Saehan Jo, and Joseph Antonakakis. 2019. Skinnerdb: Regret-bounded query evaluation via reinforcement learning. In International Conference on Management of Data. 1153--1170.Google Scholar
Digital Library
- Immanuel Trummer, Junxiong Wang, Ziyun Wei, Deepak Maram, Samuel Moseley, Saehan Jo, Joseph Antonakakis, and Ankush Rayabhari. 2021. SkinnerDB: Regret-bounded Query Evaluation via Reinforcement Learning. Transactions on Database Systems 46, 3 (2021), 1--45.Google Scholar
Digital Library
- Kostas Tzoumas, Timos Sellis, and Christian S Jensen. 2008. A reinforcement learning approach for adaptive query processing. History (2008).Google Scholar
- Yaushian Wang, Hung-Yi Lee, and Yun-Nung Chen. 2019. Tree Transformer: Integrating Tree Structures into Self-Attention. In Conferenceon Empirical Methods in Natural Language Processing-International Joint Conference on Natural Language Processing. 1061--1070.Google Scholar
Cross Ref
- Xiang Yu, Guoliang Li, Chengliang Chai, and Nan Tang. 2020. Reinforcement Learning with Tree-LSTM for Join Order Selection. In International Conference on Data Engineering. 1297--1308.Google Scholar
- Ji Zhang. 2020. AlphaJoin: Join Order Selection à la AlphaGo.. In Very Large Data Base.Google Scholar
- Xuanhe Zhou, Chengliang Chai, Guoliang Li, and Ji Sun. 2020. Database meets artificial intelligence: A survey. Transactions on Knowledge and Data Engineering (2020).Google Scholar
- Xiaodan Zhu, Parinaz Sobihani, and Hongyu Guo. 2015. Long short-term memory over recursive structures. In International Conference on Machine Learning. 1604-- 1612.Google Scholar
Index Terms
Efficient Join Order Selection Learning with Graph-based Representation





Comments