10.1145/3534678.3539303acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Efficient Join Order Selection Learning with Graph-based Representation

Authors Info & Claims
Published:14 August 2022Publication History

ABSTRACT

Join order selection plays an important role in DBMS query optimizers. The problem aims to find the optimal join order with the minimum cost, and usually becomes an NP-hard problem due to the exponentially increasing search space. Recent advanced studies attempt to use deep reinforcement learning (DRL) to generate better join plans than the ones provided by conventional query optimizers. However, DRL-based methods require time-consuming training, which is not suitable for online applications that need frequent periodic re-training. In this paper, we propose a novel framework, namely efficient Join Order selection learninG with Graph-basEd Representation (JOGGER). We firstly construct a schema graph based on the primary-foreign key relationships, from which table representations are well learned to capture the correlations between tables. The second component is the state representation, where a graph convolutional network is utilized to encode the query graph and a tailored-tree-based attention module is designed to encode the join plan. To speed up the convergence of DRL training process, we exploit the idea of curriculum learning, in which queries are incrementally added into the training set according to the level of difficulties. We conduct extensive experiments on JOB and TPC-H datasets, which demonstrate the effectiveness and efficiency of the proposed solutions.

Supplemental Material

KDD-fp0821.mp4

Join order selection plays an important role in DBMS query optimizers. The problem aims to find the optimal join order with the minimum cost. Recent advanced studies attempt to use deep reinforcement learning (DRL) to generate better join plans than the ones provided by conventional query optimizers. However, DRL-based methods require time-consuming training. In this paper, we propose a novel framework, namely efficient Join Order selection learninG with Graph-basEd Representation (JOGGER). We firstly construct a schema graph based on the primary-foreign key relationships, from which table representations are learned to capture the correlations between tables. The second component is the state representation, where GCN is utilized to encode the query graph and a tailored-tree-based attention module to encode the join plan. To speed up the convergence of training, we incorporate curriculum learning, in which queries are incrementally added into the training set according to the level of difficulties.

References

  1. Mahtab Ahmed, Muhammad Rifayat Samee, and Robert E Mercer. 2019. Improving Tree-LSTM with Tree Attention. In 2019 IEEE 13th International Conference on Semantic Computing (ICSC). IEEE Computer Society, 247--254.Google ScholarGoogle Scholar
  2. Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. 2009. Curriculum learning. In Proceedings of the Annual International Conference on Machine Learning. 41--48.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Swati V Chande and Madhavi Sinha. 2011. Genetic optimization for the join ordering problem of database queries. In India International Conference. 1--5.Google ScholarGoogle ScholarCross RefCross Ref
  4. Leonidas Fegaras. 1998. A new heuristic for optimizing large queries. In International Conference on Database and Expert Systems Applications. 726--735.Google ScholarGoogle ScholarCross RefCross Ref
  5. Carlos Florensa, David Held, Markus Wulfmeier, Michael Zhang, and Pieter Abbeel. 2017. Reverse curriculum generation for reinforcement learning. In Conference on Robot Learning. 482--495.Google ScholarGoogle Scholar
  6. Jonas Heitz and Kurt Stockinger. 2019. Join query optimization with deep reinforcement learning algorithms. arXiv preprint arXiv:1911.11689 (2019).Google ScholarGoogle Scholar
  7. Yannis E Ioannidis and Younkyung Cha Kang. 1991. Left-deep vs. bushy trees: An analysis of strategy spaces and its implications for query optimization. In Proceedings of the SIGMOD International Vonference on Management of Data. 168--177.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Lu Jiang, Deyu Meng, Shoou-I Yu, Zhenzhong Lan, Shiguang Shan, and Alexander Hauptmann. 2014. Self-paced learning with diversity. Advances in Neural Information Processing Systems 27 (2014), 2078--2086.Google ScholarGoogle Scholar
  9. Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In International Conference on Learning Representations.Google ScholarGoogle Scholar
  10. Thomas Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. ArXiv abs/1609.02907 (2017).Google ScholarGoogle Scholar
  11. Sanjay Krishnan, Zongheng Yang, Ken Goldberg, Joseph M. Hellerstein, and Ion Stoica. 2018. Learning to Optimize Join Queries With Deep Reinforcement Learning. CoRR abs/1808.03196 (2018).Google ScholarGoogle Scholar
  12. Phong Le and Willem H. Zuidema. 2015. Compositional Distributional Semantics with Long Short Term Memory. In Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics. 10--19.Google ScholarGoogle Scholar
  13. Kyeong-Min Lee, InA Kim, and Kyu-Chul Lee. 2020. DQN-based Join Order Optimization by Learning Experiences of Running Queries on Spark SQL. In International Conference on Data Mining Workshops. 740--742.Google ScholarGoogle ScholarCross RefCross Ref
  14. Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter Boncz, Alfons Kemper, and Thomas Neumann. 2015. How good are query optimizers, really? Proceedings of the Very Large Data Base Endowment 9, 3 (2015), 204--215.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Jie Liu, Wenqian Dong, Dong Li, and Qingqing Zhou. 2021. Fauce: Fast and Accurate Deep Ensembles with Uncertainty for Cardinality Estimation. Proceedings of the Very Large Data Base Endowment 14, 11 (2021), 1950--1963.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Ryan Marcus, Parimarjan Negi, Hongzi Mao, Chi Zhang, Mohammad Alizadeh, Tim Kraska, Olga Papaemmanouil, and Nesime Tatbul. 2019. Neo: a learned query optimizer. Proceedings of the Very Large Data Base Endowment 12, 11 (2019), 1705--1718.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Ryan Marcus and Olga Papaemmanouil. 2018. Deep reinforcement learning for join order enumeration. In Proceedings of the International Workshop on Exploiting Artificial Intelligence Techniques for Data Management. 1--4.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Tomas Mikolov, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. In International Conference on Learning Representations.Google ScholarGoogle Scholar
  19. Lili Mou, Ge Li, Lu Zhang, Tao Wang, and Zhi Jin. 2016. Convolutional neural networks over tree structures for programming language processing. In Thirtieth AAAI Conference on Artificial Intelligence.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Sanmit Narvekar, Jivko Sinapov, and Peter Stone. 2017. Autonomous Task Sequencing for Customized Curriculum Design in Reinforcement Learning.. In International Joint Conference on Artificial Intelligence. 2536--2542.Google ScholarGoogle ScholarCross RefCross Ref
  21. Xuan-Phi Nguyen, Shafiq Joty, Steven Hoi, and Richard Socher. 2020. TreeStructured Attention with Hierarchical Accumulation. In International Conference on Learning Representations.Google ScholarGoogle Scholar
  22. Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. DeepWalk: online learning of social representations. In International Conference on Knowledge Discovery and Data Mining. 701--710.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Meikel Poess and Chris Floyd. 2000. New TPC benchmarks for decision support and web commerce. Sigmod Record 29, 4 (2000), 64--71.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Zhipeng Ren, Daoyi Dong, Huaxiong Li, and Chunlin Chen. 2018. Self-paced prioritized curriculum learning with coverage penalty in deep reinforcement learning. Transactions on Neural Networks and Learning Systems 29, 6 (2018), 2216--2226.Google ScholarGoogle ScholarCross RefCross Ref
  25. Tom Schaul, John Quan, Ioannis Antonoglou, and David Silver. 2015. Prioritized experience replay. arXiv preprint arXiv:1511.05952 (2015).Google ScholarGoogle Scholar
  26. John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).Google ScholarGoogle Scholar
  27. Kai Sheng Tai, Richard Socher, and Christopher D Manning. 2015. Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks. In Proceedings of the Annual Meeting of the Association for Computational Linguistics and the International Joint Conference on Natural Language Processing. 1556--1566.Google ScholarGoogle ScholarCross RefCross Ref
  28. Immanuel Trummer, Junxiong Wang, Deepak Maram, Samuel Moseley, Saehan Jo, and Joseph Antonakakis. 2019. Skinnerdb: Regret-bounded query evaluation via reinforcement learning. In International Conference on Management of Data. 1153--1170.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Immanuel Trummer, Junxiong Wang, Ziyun Wei, Deepak Maram, Samuel Moseley, Saehan Jo, Joseph Antonakakis, and Ankush Rayabhari. 2021. SkinnerDB: Regret-bounded Query Evaluation via Reinforcement Learning. Transactions on Database Systems 46, 3 (2021), 1--45.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Kostas Tzoumas, Timos Sellis, and Christian S Jensen. 2008. A reinforcement learning approach for adaptive query processing. History (2008).Google ScholarGoogle Scholar
  31. Yaushian Wang, Hung-Yi Lee, and Yun-Nung Chen. 2019. Tree Transformer: Integrating Tree Structures into Self-Attention. In Conferenceon Empirical Methods in Natural Language Processing-International Joint Conference on Natural Language Processing. 1061--1070.Google ScholarGoogle ScholarCross RefCross Ref
  32. Xiang Yu, Guoliang Li, Chengliang Chai, and Nan Tang. 2020. Reinforcement Learning with Tree-LSTM for Join Order Selection. In International Conference on Data Engineering. 1297--1308.Google ScholarGoogle Scholar
  33. Ji Zhang. 2020. AlphaJoin: Join Order Selection à la AlphaGo.. In Very Large Data Base.Google ScholarGoogle Scholar
  34. Xuanhe Zhou, Chengliang Chai, Guoliang Li, and Ji Sun. 2020. Database meets artificial intelligence: A survey. Transactions on Knowledge and Data Engineering (2020).Google ScholarGoogle Scholar
  35. Xiaodan Zhu, Parinaz Sobihani, and Hongyu Guo. 2015. Long short-term memory over recursive structures. In International Conference on Machine Learning. 1604-- 1612.Google ScholarGoogle Scholar

Index Terms

  1. Efficient Join Order Selection Learning with Graph-based Representation

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!