ABSTRACT

Efficient join processing is one of the most fundamental and well-studied tasks in database research. In this work, we examine algorithms for natural join queries over many relations and describe a novel algorithm to process these queries optimally in terms of worst-case data complexity. Our result builds on recent work by Atserias, Grohe, and Marx, who gave bounds on the size of a full conjunctive query in terms of the sizes of the individual relations in the body of the query. These bounds, however, are not constructive: they rely on Shearer's entropy inequality which is information-theoretic. Thus, the previous results leave open the question of whether there exist algorithms whose running time achieve these optimal bounds. An answer to this question may be interesting to database practice, as we show in this paper that any project-join plan is polynomially slower than the optimal bound for some queries. We construct an algorithm whose running time is worst-case optimal for all natural join queries. Our result may be of independent interest, as our algorithm also yields a constructive proof of the general fractional cover bound by Atserias, Grohe, and Marx without using Shearer's inequality. In addition, we show that this bound is equivalent to a geometric inequality by Bollobás and Thomason, one of whose special cases is the famous Loomis-Whitney inequality. Hence, our results algorithmically prove these inequalities as well. Finally, we discuss how our algorithm can be used to compute a relaxed notion of joins.
- S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995. Google Scholar
Digital Library
- N. Alon, P. B. Gibbons, Y. Matias, and M. Szegedy. Tracking join and self-join sizes in limited storage. In PODS, pages 10--20, 1999. Google Scholar
Digital Library
- N. Alon, I. Newman, A. Shen, G. Tardos, and N. K. Vereshchagin. Partitioning multi-dimensional sets in a small number of "uniform" parts. Eur. J. Comb., 28(1):134--144, 2007. Google Scholar
Digital Library
- N. Alon, R. Yuster, and U. Zwick. Finding and counting given length cycles. Algorithmica, 17(3):209--223, 1997.Google Scholar
- A. Atserias, M. Grohe, and D. Marx. Size bounds and query plans for relational joins. In FOCS, pages 739--748. IEEE, 2008. Google Scholar
Digital Library
- R. Avnur and J. M. Hellerstein. Eddies: Continuously adaptive query processing. In SIGMOD Conference, pages 261--272, 2000. Google Scholar
Digital Library
- S. Babu, P. Bizarro, and D. J. DeWitt. Proactive reoptimization. In SIGMOD Conference, pages 107--118, 2005. Google Scholar
Digital Library
- B. Bollobás and A. Thomason. Projections of bodies and hereditary properties of hypergraphs. Bull. London Math. Soc., 27(5), 1995.Google Scholar
- F. R. K. Chung, R. L. Graham, P. Frankl, and J. B. Shearer. Some intersection theorems for ordered sets and graphs. J. Combin. Theory Ser. A, 43(1):23--37, 1986. Google Scholar
Digital Library
- A. Deligiannakis, M. N. Garofalakis, and N. Roussopoulos. Extended wavelets for multiple measures. TODS, 32(2):10, 2007. Google Scholar
Digital Library
- J. Flum, M. Frick, and M. Grohe. Query evaluation via tree-decompositions. J. ACM, 49(6):716--752, 2002. Google Scholar
Digital Library
- A. C. Gilbert, H. Q. Ngo, E. Porat, A. Rudra, and M. J. Strauss. Efficiently decodable l 2/l 2 for each compressed sensing with tiny failure probability, November 2011. Manuscript.Google Scholar
- G. Gottlob, S. T. Lee, and G. Valiant. Size and treewidth bounds for conjunctive queries. In PODS, pages 45--54, 2009. Google Scholar
Digital Library
- G. Gottlob, N. Leone, and F. Scarcello. Hypertree decompositions: A survey. In MFCS, 2001. Google Scholar
Digital Library
- G. Gottlob, Z. Miklós, and T. Schwentick. Generalized hypertree decompositions: np-hardness and tractable variants. In PODS, 2007. Google Scholar
Digital Library
- G. Graefe. Query evaluation techniques for large databases. ACM Computing Surveys, 25(2):73--170, June 1993. Google Scholar
Digital Library
- M. Grohe and D. Marx. Constraint solving via fractional edge covers. In SODA, pages 289--298, 2006. Google Scholar
Digital Library
- K. Gyarmati, M. Matolcsi, and I. Z. Ruzsa. A superadditivity and submultiplicativity property for cardinalities of sumsets. Combinatorica, 30(2):163--174, 2010. Google Scholar
Digital Library
- T. S. Han. Nonnegative entropy measures of multivariate symmetric correlations. Information and Control, 36(2):133--156, 1978.Google Scholar
Cross Ref
- Y. E. Ioannidis. The history of histograms (abridged). In VLDB, 2003. Google Scholar
Digital Library
- Y. E. Ioannidis and S. Christodoulakis. On the propagation of errors in the size of join results. In SIGMOD Conference, 1991. Google Scholar
Digital Library
- D. Irony, S. Toledo, and A. Tiskin. Communication lower bounds for distributed-memory matrix multiplication. J. Parallel Distrib. Comput., 64(9):1017--1026, 2004. Google Scholar
Digital Library
- H. V. Jagadish, N. Koudas, S. Muthukrishnan, V. Poosala, K. C. Sevcik, and T. Suel. Optimal Histograms with Quality Guarantees. In VLDB, 1998. Google Scholar
Digital Library
- A. C. König and G. Weikum. Combining Histograms and Parametric Curve Fitting for Feedback-Driven Query Result-size Estimation. In VLDB, 1999. Google Scholar
Digital Library
- A. R. Lehman and E. Lehman. Network coding: does the model need tuning? In SODA, pages 499--504, 2005. Google Scholar
Digital Library
- L. H. Loomis and H. Whitney. An inequality related to the isoperimetric inequality. Bull. Amer. Math. Soc, 55:961--962, 1949.Google Scholar
Cross Ref
- R. Lyons. Probability on trees and networks, jun 2011. with Yuval Peres url: http://php.indiana.edu/ rdlyons/prbtree/prbtree.html.Google Scholar
- V. Markl, N. Megiddo, M. Kutsch, T. M. Tran, P. J. Haas, and U. Srivastava. Consistently estimating the selectivity of conjuncts of predicates. In VLDB, pages 373--384, 2005. Google Scholar
Digital Library
- D. Marx. Tractable hypergraph properties for constraint satisfaction and conjunctive queries. In STOC, pages 735--744, 2010. Google Scholar
Digital Library
- H. Q. Ngo, E. Porat, C. Ré, and A. Rudra. Worst-case optimal join algorithms, 2012. arXiv:1203.1952 {cs.DB}.Google Scholar
- H. Q. Ngo, E. Porat, and A. Rudra. Personal Communciation.Google Scholar
- A. Pagh and R. Pagh. Scalable computation of acyclic joins. In PODS, pages 225--232, 2006. Google Scholar
Digital Library
- V. Poosala, Y. Ioannidis, P. Haas, and E. J. Shekita. Improved histograms for selectivity estimation of range predicates. In SIGMOD, pages 294--305, 1996. Google Scholar
Digital Library
- A. Schrijver. Combinatorial optimization. Polyhedra and efficiency. Vol. A, volume 24 of Algorithms and Combinatorics. Springer-Verlag, Berlin, 2003.Google Scholar
- U. Srivastava, P. J. Haas, V. Markl,M. Kutsch, and T. M. Tran. Isomer: Consistent histogram construction using query feedback. In ICDE, page 39, 2006. Google Scholar
Digital Library
- K. Tzoumas, A. Deshpande, and C. S. Jensen. Lightweight graphical models for selectivity estimation without independence assumptions. PVLDB, 4(11):852--863, 2011.Google Scholar
Digital Library
- L. G. Valiant and V. V. Vazirani. NP is as easy as detecting unique solutions. Theor. Comput. Sci., 47(3):85--93, 1986. Google Scholar
Digital Library
- D. E. Willard. Applications of range query theory to relational data base join and selection operations. J. Comput. Syst. Sci., 52(1), 1996. Google Scholar
Digital Library
- Y. Xu, P. Kostamaa, X. Zhou, and L. Chen. Handling data skew in parallel joins in shared-nothing systems. In SIGMOD, 2008. Google Scholar
Digital Library
Index Terms
Worst-case optimal join algorithms: [extended abstract]
Recommendations
Worst-case Optimal Join Algorithms
Efficient join processing is one of the most fundamental and well-studied tasks in database research. In this work, we examine algorithms for natural join queries over many relations and describe a new algorithm to process these queries optimally in ...
Relations between Average-Case and Worst-Case Complexity
The consequences of the worst-case assumption NP=P are very well understood. On the other hand, we only know a few consequences of the analogous average-case assumption “NP is easy on average.” In this paper we establish several new results on the worst-...
Average-case hardness of NP from exponential worst-case hardness assumptions
STOC 2021: Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of ComputingA long-standing and central open question in the theory of average-case complexity is to base average-case hardness of NP on worst-case hardness of NP. A frontier question along this line is to prove that PH is hard on average if UP requires (sub-)...






Comments