skip to main content
research-article

Steno: automatic optimization of declarative queries

Published:04 June 2011Publication History
Skip Abstract Section

Abstract

Declarative queries enable programmers to write data manipulation code without being aware of the underlying data structure implementation. By increasing the level of abstraction over imperative code, they improve program readability and, crucially, create opportunities for automatic parallelization and optimization. For example, the Language Integrated Query (LINQ) extensions to C# allow the same declarative query to process in-memory collections, and datasets that are distributed across a compute cluster. However, our experiments show that the serial performance of declarative code is several times slower than the equivalent hand-optimized code, because it is implemented using run-time abstractions---such as iterators---that incur overhead due to virtual function calls and superfluous instructions.

To address this problem, we have developed Steno, which uses a combination of novel and well-known techniques to generate code for declarative queries that is almost as efficient as hand-optimized code. Steno translates a declarative LINQ query into type-specialized, inlined and loop-based imperative code. It eliminates chains of iterators from query execution, and optimizes nested queries. We have implemented Steno for uniprocessor, multiprocessor and distributed computing platforms, and show that, for a real-world distributed job, it can almost double the speed of end-to-end execution.

References

  1. Apache Hadoop. http://hadoop.apache.org/, accessed 18th March, 2011.Google ScholarGoogle Scholar
  2. G. M. Bierman, E. Meijer, and M. Torgersen. Lost in translation: Formalizing proposed extensions to C#. In Proceedings of OOPSLA, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. P. Buneman, R. E. Frankel, and R. Nikhil. An implementation technique for database query languages. ACM Trans. Database Syst., 7 (2), 1982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. J. Cafarella and C. Ré. Manimal: Relational optimization for data-intensive programs. In Proceedings of WebDB, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. B. Calder and D. Grunwald. Reducing indirect function call overhead in C programs. In Proceedings of POPL, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. D. Chamberlin, M. M. Astrahan, M. W. Blasgen, J. N. Gray, W. F. King, B. G. Lindsay, R. Lorie, J. W. Mehl, T. G. Price, F. Putzolu, P. G. Selinger, M. Schkolnick, D. R. Slutz, I. L. Traiger, B. W. Wade, and R. A. Yost. A history and evaluation of System R. Commun. ACM, 24 (10), 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. Chambers, A. Raniwala, F. Perry, S. Adams, R. R. Henry, R. Bradshaw, and N. Weizenbaum. FlumeJava: easy, efficient data-parallel pipelines. In Proceedings of PLDI, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. E. F. Codd. A relational model of data for large shared data banks. Commun. ACM, 13 (6), 1970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. D. Coutts, R. Leshchinskiy, and D. Stewart. Stream fusion: from lists to streams to nothing at all. In Proceedings of ICFP, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. In Proceedings of OSDI, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Dean, D. Grove, and C. Chambers. Optimization of object-oriented programs using static class hierarchy analysis. In Proceedings of ECOOP, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. D. J. DeWitt and J. Gray. Parallel database systems: the future of high performance database systems. Commun. ACM, 35 (6), 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. J. DeWitt, R. H. Katz, F. Olken, L. D. Shapiro, M. R. Stonebraker, and D. A. Wood. Implementation techniques for main memory database systems. In Proceedings of SIGMOD, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. Florescu, C. Hillery, D. Kossmann, P. Lucas, F. Riccardi, T. Westmann, M. J. Carey, A. Sundararajan, and G. Agrawal. The BEA/XQRL streaming XQuery processor. In Proceedings of VLDB, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. C. Freytag and N. Goodman. On the translation of relational queries into iterative programs. ACM Trans. Database Syst., 14 (1), 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. C. Freytag and N. Goodman. Translating aggregate queries into iterative programs. In Proceedings of VLDB, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. Gill, J. Launchbury, and S. L. Peyton Jones. A short cut to deforestation. In Proceedings of FPCA, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. P. K. Gunda, L. Ravindranath, C. A. Thekkath, Y. Yu, and L. Zhuang. Nectar: Automatic Management of Data and Computation in Data Centers. In Proceedings of OSDI, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: Distributed data-parallel programs from sequential building blocks. In Proceedings of EuroSys, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. K. Ishizaki, M. Kawahito, T. Yasue, H. Komatsu, and T. Nakatani. A study of devirtualization techniques for a Java Just-In-Time compiler. In Proceedings of OOPSLA, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M.-Y. Iu and W. Zwaenepoel. HadoopToSQL: a MapReduce query optimizer. In Proceedings of EuroSys, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. B. Jacobs, E. Meijer, F. Piessens, and W. Schulte. Iterators revisited: proof rules and implementation, 2005.Google ScholarGoogle Scholar
  23. K. Krikellas, S. D. Viglas, and M. Cintra. Generating code for holistic query evaluation. In Proceedings of ICDE, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  24. X. Li and G. Agrawal. Efficient evaluation of XQuery over streaming data. In Proceedings of VLDB, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. E. Meijer. Confessions of a used programming language salesman. SIGPLAN Not., 42 (10), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig Latin: a not-so-foreign language for data processing. In Proceedings of SIGMOD, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. C. Reichenbach, M. G. Burke, I. Peshansky, and M. Raghavachari. Analysis of imperative XML programs. Information Systems, 34 (7), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J. Svenningsson. Shortcut fusion for accumulating parameters & zip-like functions. In Proceedings of ICFP, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. R. Tan, P. Nagpal, and S. Miller. Automated black box testing tool for a parallel programming library. In Proceedings of ICST, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. D. Tarditi, S. Puri, and J. Oglesby. Accelerator: using data parallelism to program GPUs for general-purpose uses. In Proceedings of ASPLOS, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. P. Wadler. Deforestation: transforming programs to eliminate trees. In Proceedings of ESOP, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Y. Yu, M. Isard, D. Fetterly, M. Budiu, Ú. Erlingsson, P. K. Gunda, and J. Currey. DryadLINQ: a system for general-purpose distributed data-parallel computing using a high-level language. In Proceedings of OSDI, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Y. Yu, P. K. Gunda, and M. Isard. Distributed aggregation for data-parallel computing: interfaces and implementations. In Proceedings of SOSP, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Steno: automatic optimization of declarative queries

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 46, Issue 6
      PLDI '11
      June 2011
      652 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/1993316
      Issue’s Table of Contents
      • cover image ACM Conferences
        PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation
        June 2011
        668 pages
        ISBN:9781450306638
        DOI:10.1145/1993498
        • General Chair:
        • Mary Hall,
        • Program Chair:
        • David Padua

      Copyright © 2011 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 4 June 2011

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!