Abstract
Motivated by streaming and data analytics scenarios where many queries operate on the same data and perform similar computations, we propose program consolidation for merging multiple user-defined functions (UDFs) that operate on the same input. Program consolidation exploits common computations between UDFs to generate an equivalent optimized function whose execution cost is often much smaller (and never greater) than the sum of the costs of executing each function individually. We present a sound consolidation calculus and an effective algorithm for consolidating multiple UDFs. Our approach is purely static and uses symbolic SMT-based techniques to identify shared or redundant computations. We have implemented the proposed technique on top of the Naiad data processing system. Our experiments show that our algorithm dramatically improves overall job completion time when executing user-defined filters that operate on the same data and perform similar computations.
- Linq project,. URL http://msdn.microsoft.com/en-us/library/vstudio/bb397926.aspx.Google Scholar
- Msdn linq documentation,. URL http://msdn.microsoft.com/en-us/library/bb669073(v=vs.110).aspx.Google Scholar
- Storm project. URL http://storm-project.net.Google Scholar
- Wordcount example. URL https://github.com/MicrosoftResearchSVC/Naiad/blob/release_0.2/Examples/.Google Scholar
- R.-J. Back. A calculus of refinements for program derivations. Acta Inf., 25(6):593--624, 1988. Google Scholar
Digital Library
- N. Bruno, S. Agarwal, S. Kandula, B. Shi, M. Wu, and J. Zhou. Recurring job optimization in scope. SIGMOD, pages 805--806, 2012. Google Scholar
Digital Library
- S. Chaudhuri and K. Shim. Optimization of queries with user-defined predicates. ACM Trans. Database Syst., 24(2):177--228, 1999. Google Scholar
Digital Library
- D. Coutts, R. Leshchinskiy, and D. Stewart. Stream fusion: From lists to streams to nothing at all. ICFP, pages 315--326, 2007. Google Scholar
Digital Library
- L. De Moura and N. Bjørner. Z3: An efficient smt solver. In TACAS, pages 337--340. Springer, 2008. Google Scholar
Digital Library
- E. W. Dijkstra. A Discipline of Programming. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1976. Google Scholar
Digital Library
- S. Finkelstein. Common expression analysis in database applications. SIGMOD '82, pages 235--245, 1982. Google Scholar
Digital Library
- G. Giannikis, G. Alonso, and D. Kossmann. SharedDB: killing one thousand queries with one stone. VLDB, 5(6):526--537, Feb. 2012. Google Scholar
Digital Library
- A. Gill, J. Launchbury, and S. L. Peyton Jones. A short cut to deforestation. FPCA '93, pages 223--232, 1993. Google Scholar
Digital Library
- P. K. Gunda, L. Ravindranath, C. A. Thekkath, Y. Yu, and L. Zhuang. Nectar: automatic management of data and computation in datacenters. OSDI'10, pages 1--8, 2010. Google Scholar
Digital Library
- M. Haghighat and C. D. Polychronopoulos. Symbolic analysis: A basis for parallelization, optimization, and scheduling of programs. In LCPC, pages 567--585, 1993. Google Scholar
Digital Library
- E. C. R. Hehner. Formalization of time and space. Formal Asp. Comput., 10(3):290--306, 1998.Google Scholar
Cross Ref
- F. Hueske, M. Peters, M. J. Sax, A. Rheinländer, R. Bergmann, A. Krettek, and K. Tzoumas. Opening the black boxes in data flow optimization. Proc. VLDB Endow., 5(11):1256--1267, July 2012. Google Scholar
Digital Library
- N. D. Jones, C. K. Gomard, and P. Sestoft. Partial Evaluation and Automatic Program Generation. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1993. ISBN 0-13-020249-5. Google Scholar
Digital Library
- K. Krikellas, S. Viglas, and M. Cintra. Generating code for holistic query evaluation. In ICDE, pages 613--624, 2010.Google Scholar
Cross Ref
- E. Meijer, M. M. Fokkinga, and R. Paterson. Functional programming with bananas, lenses, envelopes and barbed wire. In FPCA, pages 124--144, 1991. Google Scholar
Digital Library
- C. Morgan. Programming from Specifications. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1990. ISBN 0-13-726225-6. Google Scholar
Digital Library
- K. Munagala, U. Srivastava, and J. Widom. Optimization of continuous queries with shared expensive filters. PODS'07, pages 215--224. Google Scholar
Digital Library
- D. G. Murray, M. Isard, and Y. Yu. Steno: automatic optimization of declarative queries. PLDI '11, pages 121--131, 2011. Google Scholar
Digital Library
- D. G. Murray, F. McSherry, R. Isaacs, M. Isard, P. Barham, and M. Abadi. Naiad: a timely dataflow system. In SOSP, pages 439--455, 2013. Google Scholar
Digital Library
- L. Neumeyer, B. Robbins, A. Nair, and A. Kesari. S4: Distributed stream computing platform. ICDMW '10, pages 170--177, 2010. Google Scholar
Digital Library
- C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig latin: a not-so-foreign language for data processing. In SIGMOD, pages 1099--1110, 2008. Google Scholar
Digital Library
- J. Park and A. Segev. Using common subexpressions to optimize multiple queries. In ICDE, pages 311--319, 1988. Google Scholar
Digital Library
- G. Press. Scaling users, not data: Sisense new take on machine learning and crowdsourcing. URL http://aka.ms/Fzo7sh.Google Scholar
- W. Pugh. The Omega Test: a fast and practical integer programming algorithm for dependence analysis. CACM, 8:4--13, 1992.Google Scholar
- T. K. Sellis. Multiple-query optimization. ACM Trans. Database Syst., 13(1):23--52, Mar. 1988. Google Scholar
Digital Library
- Y. N. Silva, P. Larson, and J. Zhou. Exploiting common subexpressions for cloud query processing. ICDE, pages 1337--1348, 2012. Google Scholar
Digital Library
- A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, and R. Murthy. Hive: a warehousing solution over a map-reduce framework. VLDB, 2(2):1626--1629, 2009. Google Scholar
Digital Library
- S. D. Viglas. Just-in-time compilation for sql query processing. Proc. VLDB Endow., 6(11):1190--1191, Aug. 2013. Google Scholar
Digital Library
- Y. Yu, M. Isard, D. Fetterly, M. Budiu, U. Erlingsson, P. Gunda, and J. Currey. Dryadlinq: a system for general-purpose distributed data-parallel computing using a high-level language. OSDI'08, pages 1--14. Google Scholar
Digital Library
- J. Zhang, H. Zhou, R. Chen, X. Fan, Z. Guo, H. Lin, J. Y. Li, W. Lin, J. Zhou, and L. Zhou. Optimizing data shuffling in data-parallel computation by understanding user-defined functions. NSDI'12, pages 22--22. Google Scholar
Digital Library
- J. Zhou, P.-A. Larson, J.-C. Freytag, and W. Lehner. Efficient exploitation of similar subexpressions for query processing. SIGMOD '07, pages 533--544, 2007. Google Scholar
Digital Library
Index Terms
Consolidation of queries with user-defined functions
Recommendations
Consolidation of queries with user-defined functions
PLDI '14: Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and ImplementationMotivated by streaming and data analytics scenarios where many queries operate on the same data and perform similar computations, we propose program consolidation for merging multiple user-defined functions (UDFs) that operate on the same input. Program ...
Performance Analysis for Pareto-Optimal Green Consolidation Based on Virtual Machines Live Migration
Huge energy requirement of cloud data centers is prime concern. Dynamic Virtual Machine VM consolidation based on VM live migration to switched-off or put some of the under-loaded host Physical Machines PMs into a low power consumption mode can ...
Workload-aware database monitoring and consolidation
SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of dataIn most enterprises, databases are deployed on dedicated database servers. Often, these servers are underutilized much of the time. For example, in traces from almost 200 production servers from different organizations, we see an average CPU utilization ...







Comments