skip to main content
10.1145/2463664.2465224acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Communication steps for parallel query processing

Published:22 June 2013Publication History

ABSTRACT

We consider the problem of computing a relational query q on a large input database of size n, using a large number p of servers. The computation is performed in rounds, and each server can receive only O(n/p1-ε) bits of data, where ε ∈[0,1] is a parameter that controls replication. We examine how many global communication steps are needed to compute q. We establish both lower and upper bounds, in two settings. For a single round of communication, we give lower bounds in the strongest possible model, where arbitrary bits may be exchanged; we show that any algorithm requires ε ≥ 1--1/τ*, where τ* is the fractional vertex cover of the hypergraph of q. We also give an algorithm that matches the lower bound for a specific class of databases. For multiple rounds of communication, we present lower bounds in a model where routing decisions for a tuple are tuple-based. We show that for the class of tree-like queries there exists a tradeoff between the number of rounds and the space exponent ε. The lower bounds for multiple rounds are the first of their kind. Our results also imply that transitive closure cannot be computed in O(1) rounds of communication.

References

  1. F. N. Afrati, A. D. Sarma, S. Salihoglu, and J. D. Ullman. Upper and lower bounds on the cost of a map-reduce computation. CoRR, abs/1206.4377, 2012.Google ScholarGoogle Scholar
  2. F. N. Afrati and J. D. Ullman. Optimizing joins in a map-reduce environment. In EDBT, pages 99--110, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating the frequency moments. JCSS, 58(1):137--147, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Atserias, M. Grohe, and D. Marx. Size bounds and query plans for relational joins. In FOCS, pages 739--748, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Chaudhuri. What next?: a half-dozen data management research goals for big data and the cloud. In PODS, pages 1--4, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. F. R. K. Chung, Z. Füredi, M. R. Garey, and R. L. Graham. On the fractional covering number of hypergraphs. SIAM J. Discrete Math., 1(1):45--49, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. In OSDI, pages 137--150, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. EMC Corporation. Data science revealed: A data-driven glimpse into the burgeoning new field. http://www.emc.com/collateral/about/news/emc-data-science-study-wp.pdf.Google ScholarGoogle Scholar
  9. J. Feldman, S. Muthukrishnan, A. Sidiropoulos, C. Stein, and Z. Svitkina. On distributing symmetric streaming computations. ACM Transactions on Algorithms, 6(4), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. E. Friedgut. Hypergraphs, entropy, and inequalities. American Mathematical Monthly, pages 749--760, 2004.Google ScholarGoogle Scholar
  11. A. Gál and P. Gopalan. Lower bounds on streaming algorithms for approximating the length of the longest increasing subsequence. In FOCS, pages 294--304, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Ganguly, A. Silberschatz, and S. Tsur. Parallel bottom-up processing of datalog queries. J. Log. Program., 14(1&2):101--126, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Grohe and D. Marx. Constraint solving via fractional edge covers. In SODA, pages 289--298, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Guha and Z. Huang. Revisiting the direct sum theorem and space lower bounds in random order streams. In ICALP, volume 5555 of LNCS, pages 513--524. Springer, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Hadoop. http://hadoop.apache.org/.Google ScholarGoogle Scholar
  16. H. J. Karloff, S. Suri, and S. Vassilvitskii. A model of computation for mapreduce. In SODA, pages 938--948, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. P. Koutris and D. Suciu. Parallel evaluation of conjunctive queries. In PODS, pages 223--234, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. E. Kushilevitz and N. Nisan. Communication Complexity. Cambridge University Press, Cambridge, England ; New York, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Melnik, A. Gubarev, J. J. Long, G. Romer, S. Shivakumar, M. Tolton, and T. Vassilakis. Dremel: Interactive analysis of web-scale datasets. PVLDB, 3(1):330--339, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. H. Q. Ngo, E. Porat, C. Ré, and A. Rudra. Worst-case optimal join algorithms: {extended abstract}. In PODS, pages 37--48, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig latin: a not-so-foreign language for data processing. In SIGMOD Conference, pages 1099--1110, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. Suri and S. Vassilvitskii. Counting triangles and the curse of the last reducer. In WWW, pages 607--614, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, and R. Murthy. Hive - a warehousing solution over a map-reduce framework. PVLDB, 2(2):1626--1629, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. P. Tiwari. Lower bounds on communication complexity in distributed computer networks. JACM, 34(4):921--938, Oct. 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. D. Ullman. Designing good mapreduce algorithms. ACM Crossroads, 19(1):30--34, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. A. C. Yao. Lower bounds by probabilistic arguments. In FOCS, pages 420--428, Tucson, AZ, 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Communication steps for parallel query processing

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      PODS '13: Proceedings of the 32nd ACM SIGMOD-SIGACT-SIGAI symposium on Principles of database systems
      June 2013
      334 pages
      ISBN:9781450320665
      DOI:10.1145/2463664
      • General Chair:
      • Richard Hull,
      • Program Chair:
      • Wenfei Fan

      Copyright © 2013 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 22 June 2013

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      PODS '13 Paper Acceptance Rate24of97submissions,25%Overall Acceptance Rate476of1,835submissions,26%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!