skip to main content
article

Fusing effectful comprehensions

Published:14 June 2017Publication History
Skip Abstract Section

Abstract

List comprehensions provide a powerful abstraction mechanism for expressing computations over ordered collections of data declaratively without having to use explicit iteration constructs. This paper puts forth effectful comprehensions as an elegant way to describe list comprehensions that incorporate loop-carried state. This is motivated by operations such as compression/decompression and serialization/deserialization that are common in log/data processing pipelines and require loop-carried state when processing an input stream of data.

We build on the underlying theory of symbolic transducers to fuse pipelines of effectful comprehensions into a single representation, from which efficient code can be generated. Using background theory reasoning with an SMT solver, our fusion and subsequent reachability based branch elimination algorithms can significantly reduce the complexity of the fused pipelines. Our implementation shows significant speedups over reasonable hand-written code (3.4×, on average) and traditionally fused version of the pipeline (2.6×, on average) for a variety of examples, including scenarios for extracting fields with regular expressions, processing XML with XPath, and running queries over encoded data.

Skip Supplemental Material Section

Supplemental Material

References

  1. Conduit (Haskell library). https://github.com/snoyberg/conduit.Google ScholarGoogle Scholar
  2. Emoticons. http://unicode.org/charts/PDF/U1F600.pdf.Google ScholarGoogle Scholar
  3. Apache Flink. https://flink.apache.org/.Google ScholarGoogle Scholar
  4. Apache Hadoop. http://hadoop.apache.org/.Google ScholarGoogle Scholar
  5. Highland.js. http://highlandjs.org/.Google ScholarGoogle Scholar
  6. The .NET compiler platform “Roslyn”. https://github.com/dotnet/roslyn.Google ScholarGoogle Scholar
  7. Spark Streaming. http://spark.apache.org/streaming/.Google ScholarGoogle Scholar
  8. S. Agrawal, W. Thies, and S. Amarasinghe. Optimizing stream programs using linear state space analysis. In Proceedings of the 2005 International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES’05), pages 126– 136. ACM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Alexandrov, R. Bergmann, S. Ewen, J.-C. Freytag, F. Hueske, A. Heise, O. Kao, M. Leich, U. Leser, V. Markl, F. Naumann, M. Peters, A. Rheinländer, M. J. Sax, S. Schelter, M. Höger, K. Tzoumas, and D. Warneke. The Stratosphere platform for big data analytics. The VLDB Journal, 23(6): 939–964, Dec. 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. Alur and J. V. Deshmukh. Nondeterministic streaming string transducers. In Proceedings of Automata, Languages and Programming: 38th International Colloquium (ICALP 2011), volume 6756 of LNCS, pages 1–20. Springer, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. R. Alur and P. ˇCerný. Streaming transducers for algorithmic verification of single-pass list-processing programs. In Proceedings of the 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’11), pages 599–610. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. Alur, A. Freilich, and M. Raghothaman. Regular combinators for string transformations. In Proceedings of the Joint Meeting of the Twenty-Third EACSL Annual Conference on Computer Science Logic (CSL) and the Twenty-Ninth Annual ACM/IEEE Symposium on Logic in Computer Science (LICS), pages 9:1–9:10. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. Alur, L. D’Antoni, and M. Raghothaman. DReX: A declarative language for efficiently evaluating regular string transformations. In Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’15), pages 125–137. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. Alur and P. ˇCerný. Expressiveness of streaming string transducers. In IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2010), volume 8 of Leibniz International Proceedings in Informatics (LIPIcs), pages 1–12, Dagstuhl, Germany, 2010. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik.Google ScholarGoogle Scholar
  15. D. Calvanese, G. Giacomo, M. Lenzerini, and M. Y. Vardi. An automata-theoretic approach to regular XPath. In Proceedings of the 12th International Symposium on Database Programming Languages (DBPL’09), volume 5708 of LNCS, pages 18–35. Springer, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. Coutts, R. Leshchinskiy, and D. Stewart. Stream fusion: From lists to streams to nothing at all. In Proceedings of the 12th ACM SIGPLAN International Conference on Functional Programming (ICFP’07), pages 315–326. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. L. D’Antoni and M. Veanes. Extended symbolic finite automata and transducers. Formal Methods in System Design, 47 (1):93–119, Aug. 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. L. De Moura and N. Bjørner. Z3: An efficient SMT solver. In Proceedings of the 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS’08), volume 4963 of LNCS, pages 337–340. Springer, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. Communications of the ACM, 51(1):107–113, Jan. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. D. Debarbieux, O. Gauwin, J. Niehren, T. Sebastian, and M. Zergaoui. Early nested word automata for XPath query answering on XML streams. Theoretical Computer Science, 578:100–125, May 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. Engelfriet and H. J. Hoogeboom. MSO definable string transductions and two-way finite-state transducers. ACM Transactions on Computational Logic, 2(2):216–254, Apr. 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. P. Fradet and S. H. T. Ha. Network fusion. In Proceedings of Programming Languages and Systems: Second Asian Symposium (APLAS’04), volume 3302 of LNCS, pages 21–40. Springer, 2004.Google ScholarGoogle Scholar
  23. M. I. Gordon, W. Thies, M. Karczmarek, J. Lin, A. S. Meli, A. A. Lamb, C. Leger, J. Wong, H. Hoffmann, D. Maze, and S. Amarasinghe. A stream compiler for communicationexposed architectures. In Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS X), pages 291–303. ACM, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. B. B. Grathwohl, F. Henglein, U. T. Rasmussen, K. A. Søholm, and S. P. Tørholm. Kleenex: Compiling nondeterministic transducers to deterministic streaming transducers. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’16), pages 284– 297. ACM, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M. Hirzel, R. Soulé, S. Schneider, B. Gedik, and R. Grimm. A catalog of stream processing optimizations. ACM Computing Surveys, 46(4):46:1–46:34, Mar. 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. E. Hopcroft and J. D. Ullman. Introduction to Automata Theory, Languages, and Computation. Addison-Wesley, 1979. ISBN 0321455363. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. M. Hyland, G. D. Plotkin, and J. Power. Combining effects: Sum and tensor. Theoretical Computer Science, 357(1-3):70– 99, July 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. G. Mainland, R. Leshchinskiy, and S. Peyton Jones. Exploiting vector instructions with generalized stream fusion. In Proceedings of the 18th ACM SIGPLAN International Conference on Functional Programming (ICFP’13), pages 37–48. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. A. Maletti, J. Graehl, M. Hopkins, and K. Knight. The power of extended top-down tree transducers. SIAM Journal on Computing, 39(2):410–430, June 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. E. Meijer, B. Beckman, and G. Bierman. LINQ: Reconciling object, relations and XML in the .NET framework. In Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data (SIGMOD’06), pages 706–706. ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. T. Milo, D. Suciu, and V. Vianu. Typechecking for XML transformers. In Proceedings of the Nineteenth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS’00), pages 11–22. ACM, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. E. Moggi. Notions of computation and monads. Information and Computation, 93(1):55–92, July 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. B. Mozafari, K. Zeng, L. D’antoni, and C. Zaniolo. Highperformance complex event processing over hierarchical data. ACM Transactions on Database Systems, 38(4):21:1–21:39, Dec. 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. D. G. Murray, M. Isard, and Y. Yu. Steno: Automatic optimization of declarative queries. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’11), pages 121–131. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. M. Poess, T. Rabl, H.-A. Jacobsen, and B. Caufield. TPC-DI: The first industry benchmark for data integration. Proceedings of the VLDB Endowment, 7(13):1367–1378, Aug. 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. T. A. Proebsting and S. A. Watterson. Filter fusion. In Proceedings of the 23rd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’96), pages 119–130. ACM, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Y. Sakuma, Y. Minamide, and A. Voronkov. Translating regular expression matching into transducers. Journal of Applied Logic, 10(1):32–51, Mar. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. S. Schneider, M. Hirzel, B. Gedik, and K.-L. Wu. Autoparallelizing stateful distributed streaming applications. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques (PACT’12), pages 53–64. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. J. Sermulins, W. Thies, R. Rabbah, and S. Amarasinghe. Cache aware optimization of stream programs. In Proceedings of the 2005 ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES’05), pages 115–126. ACM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. O. Shivers and M. Might. Continuations and transducer composition. In Proceedings of the 27th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’06), pages 295–307. ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. J. H. Spring, J. Privat, R. Guerraoui, and J. Vitek. Stream-Flex: High-throughput stream programming in Java. In Proceedings of the 22nd Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems and Applications (OOPSLA’07), pages 211–228. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. W. Thies, M. Karczmarek, and S. P. Amarasinghe. StreamIt: A language for streaming applications. In Proceedings of the 11th International Conference on Compiler Construction (CC’02), volume 2304 of LNCS, pages 179–196. Springer, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. M. Veanes, P. Hooimeijer, B. Livshits, D. Molnar, and N. Bjorner. Symbolic finite state transducers: Algorithms and applications. In Proceedings of the 39th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’12), pages 137–150. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. M. Veanes, T. Mytkowicz, D. Molnar, and B. Livshits. Dataparallel string-manipulating programs. In Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’15), pages 139– 152. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. P. Wadler. Deforestation: Transforming programs to eliminate trees. Theoretical Computer Science, 73(2):231–248, Jan. 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. P. Wadler. Comprehending monads. In Proceedings of the 1990 ACM Conference on LISP and Functional Programming (LFP’90), pages 61–78. ACM, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: Cluster computing with working sets. In Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing (HotCloud’10), pages 10–10. USENIX Association, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Fusing effectful comprehensions

                  Recommendations

                  Comments

                  Login options

                  Check if you have access through your login credentials or your institution to get full access on this article.

                  Sign in

                  Full Access

                  PDF Format

                  View or Download as a PDF file.

                  PDF

                  eReader

                  View online with eReader.

                  eReader
                  About Cookies On This Site

                  We use cookies to ensure that we give you the best experience on our website.

                  Learn more

                  Got it!