Abstract
Declarative queries enable programmers to write data manipulation code without being aware of the underlying data structure implementation. By increasing the level of abstraction over imperative code, they improve program readability and, crucially, create opportunities for automatic parallelization and optimization. For example, the Language Integrated Query (LINQ) extensions to C# allow the same declarative query to process in-memory collections, and datasets that are distributed across a compute cluster. However, our experiments show that the serial performance of declarative code is several times slower than the equivalent hand-optimized code, because it is implemented using run-time abstractions---such as iterators---that incur overhead due to virtual function calls and superfluous instructions.
To address this problem, we have developed Steno, which uses a combination of novel and well-known techniques to generate code for declarative queries that is almost as efficient as hand-optimized code. Steno translates a declarative LINQ query into type-specialized, inlined and loop-based imperative code. It eliminates chains of iterators from query execution, and optimizes nested queries. We have implemented Steno for uniprocessor, multiprocessor and distributed computing platforms, and show that, for a real-world distributed job, it can almost double the speed of end-to-end execution.
- Apache Hadoop. http://hadoop.apache.org/, accessed 18th March, 2011.Google Scholar
- G. M. Bierman, E. Meijer, and M. Torgersen. Lost in translation: Formalizing proposed extensions to C#. In Proceedings of OOPSLA, 2007. Google Scholar
Digital Library
- P. Buneman, R. E. Frankel, and R. Nikhil. An implementation technique for database query languages. ACM Trans. Database Syst., 7 (2), 1982. Google Scholar
Digital Library
- M. J. Cafarella and C. Ré. Manimal: Relational optimization for data-intensive programs. In Proceedings of WebDB, 2010. Google Scholar
Digital Library
- B. Calder and D. Grunwald. Reducing indirect function call overhead in C programs. In Proceedings of POPL, 1994. Google Scholar
Digital Library
- D. D. Chamberlin, M. M. Astrahan, M. W. Blasgen, J. N. Gray, W. F. King, B. G. Lindsay, R. Lorie, J. W. Mehl, T. G. Price, F. Putzolu, P. G. Selinger, M. Schkolnick, D. R. Slutz, I. L. Traiger, B. W. Wade, and R. A. Yost. A history and evaluation of System R. Commun. ACM, 24 (10), 1981. Google Scholar
Digital Library
- C. Chambers, A. Raniwala, F. Perry, S. Adams, R. R. Henry, R. Bradshaw, and N. Weizenbaum. FlumeJava: easy, efficient data-parallel pipelines. In Proceedings of PLDI, 2010. Google Scholar
Digital Library
- E. F. Codd. A relational model of data for large shared data banks. Commun. ACM, 13 (6), 1970. Google Scholar
Digital Library
- D. Coutts, R. Leshchinskiy, and D. Stewart. Stream fusion: from lists to streams to nothing at all. In Proceedings of ICFP, 2007. Google Scholar
Digital Library
- J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. In Proceedings of OSDI, 2004. Google Scholar
Digital Library
- J. Dean, D. Grove, and C. Chambers. Optimization of object-oriented programs using static class hierarchy analysis. In Proceedings of ECOOP, 1995. Google Scholar
Digital Library
- D. J. DeWitt and J. Gray. Parallel database systems: the future of high performance database systems. Commun. ACM, 35 (6), 1992. Google Scholar
Digital Library
- D. J. DeWitt, R. H. Katz, F. Olken, L. D. Shapiro, M. R. Stonebraker, and D. A. Wood. Implementation techniques for main memory database systems. In Proceedings of SIGMOD, 1984. Google Scholar
Digital Library
- D. Florescu, C. Hillery, D. Kossmann, P. Lucas, F. Riccardi, T. Westmann, M. J. Carey, A. Sundararajan, and G. Agrawal. The BEA/XQRL streaming XQuery processor. In Proceedings of VLDB, 2003. Google Scholar
Digital Library
- J. C. Freytag and N. Goodman. On the translation of relational queries into iterative programs. ACM Trans. Database Syst., 14 (1), 1989. Google Scholar
Digital Library
- J. C. Freytag and N. Goodman. Translating aggregate queries into iterative programs. In Proceedings of VLDB, 1986. Google Scholar
Digital Library
- A. Gill, J. Launchbury, and S. L. Peyton Jones. A short cut to deforestation. In Proceedings of FPCA, 1993. Google Scholar
Digital Library
- P. K. Gunda, L. Ravindranath, C. A. Thekkath, Y. Yu, and L. Zhuang. Nectar: Automatic Management of Data and Computation in Data Centers. In Proceedings of OSDI, 2010. Google Scholar
Digital Library
- M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: Distributed data-parallel programs from sequential building blocks. In Proceedings of EuroSys, 2007. Google Scholar
Digital Library
- K. Ishizaki, M. Kawahito, T. Yasue, H. Komatsu, and T. Nakatani. A study of devirtualization techniques for a Java Just-In-Time compiler. In Proceedings of OOPSLA, 2000. Google Scholar
Digital Library
- M.-Y. Iu and W. Zwaenepoel. HadoopToSQL: a MapReduce query optimizer. In Proceedings of EuroSys, 2010. Google Scholar
Digital Library
- B. Jacobs, E. Meijer, F. Piessens, and W. Schulte. Iterators revisited: proof rules and implementation, 2005.Google Scholar
- K. Krikellas, S. D. Viglas, and M. Cintra. Generating code for holistic query evaluation. In Proceedings of ICDE, 2010.Google Scholar
Cross Ref
- X. Li and G. Agrawal. Efficient evaluation of XQuery over streaming data. In Proceedings of VLDB, 2005. Google Scholar
Digital Library
- E. Meijer. Confessions of a used programming language salesman. SIGPLAN Not., 42 (10), 2007. Google Scholar
Digital Library
- C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig Latin: a not-so-foreign language for data processing. In Proceedings of SIGMOD, 2008. Google Scholar
Digital Library
- C. Reichenbach, M. G. Burke, I. Peshansky, and M. Raghavachari. Analysis of imperative XML programs. Information Systems, 34 (7), 2009. Google Scholar
Digital Library
- J. Svenningsson. Shortcut fusion for accumulating parameters & zip-like functions. In Proceedings of ICFP, 2002. Google Scholar
Digital Library
- R. Tan, P. Nagpal, and S. Miller. Automated black box testing tool for a parallel programming library. In Proceedings of ICST, 2009. Google Scholar
Digital Library
- D. Tarditi, S. Puri, and J. Oglesby. Accelerator: using data parallelism to program GPUs for general-purpose uses. In Proceedings of ASPLOS, 2006. Google Scholar
Digital Library
- P. Wadler. Deforestation: transforming programs to eliminate trees. In Proceedings of ESOP, 1988. Google Scholar
Digital Library
- Y. Yu, M. Isard, D. Fetterly, M. Budiu, Ú. Erlingsson, P. K. Gunda, and J. Currey. DryadLINQ: a system for general-purpose distributed data-parallel computing using a high-level language. In Proceedings of OSDI, 2008. Google Scholar
Digital Library
- Y. Yu, P. K. Gunda, and M. Isard. Distributed aggregation for data-parallel computing: interfaces and implementations. In Proceedings of SOSP, 2009. Google Scholar
Digital Library
Index Terms
Steno: automatic optimization of declarative queries
Recommendations
Steno: automatic optimization of declarative queries
PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and ImplementationDeclarative queries enable programmers to write data manipulation code without being aware of the underlying data structure implementation. By increasing the level of abstraction over imperative code, they improve program readability and, crucially, ...
Equivalence and minimization of conjunctive queries under combined semantics
ICDT '12: Proceedings of the 15th International Conference on Database TheoryThe problems of query containment, equivalence, and minimization are fundamental problems in the context of query processing and optimization. In their classic work [2] published in 1977, Chandra and Merlin solved the three problems for the language of ...
Operational semantics-directed compilers and machine architectures
We consider the task of automatically constructing intermediate-level machine architectures and compilers generating code for these architectures, given operational semantics for source languages. We use operational semantics in the form of abstract ...







Comments