Abstract
List comprehensions provide a powerful abstraction mechanism for expressing computations over ordered collections of data declaratively without having to use explicit iteration constructs. This paper puts forth effectful comprehensions as an elegant way to describe list comprehensions that incorporate loop-carried state. This is motivated by operations such as compression/decompression and serialization/deserialization that are common in log/data processing pipelines and require loop-carried state when processing an input stream of data.
We build on the underlying theory of symbolic transducers to fuse pipelines of effectful comprehensions into a single representation, from which efficient code can be generated. Using background theory reasoning with an SMT solver, our fusion and subsequent reachability based branch elimination algorithms can significantly reduce the complexity of the fused pipelines. Our implementation shows significant speedups over reasonable hand-written code (3.4×, on average) and traditionally fused version of the pipeline (2.6×, on average) for a variety of examples, including scenarios for extracting fields with regular expressions, processing XML with XPath, and running queries over encoded data.
Supplemental Material
Available for Download
This artifact includes the benchmarks described in the paper: Olli Saarikivi, Margus Veanes, Todd Mytkowicz and Madan Musuvathi. Fusing Effectful Comprehensions. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'17). ACM, 2017. The artifact is a modified version of the Automata library available at https://github.com/AutomataDotNet/Automata See ReadMe.txt in the archive for usage instructions.
- Conduit (Haskell library). https://github.com/snoyberg/conduit.Google Scholar
- Emoticons. http://unicode.org/charts/PDF/U1F600.pdf.Google Scholar
- Apache Flink. https://flink.apache.org/.Google Scholar
- Apache Hadoop. http://hadoop.apache.org/.Google Scholar
- Highland.js. http://highlandjs.org/.Google Scholar
- The .NET compiler platform “Roslyn”. https://github.com/dotnet/roslyn.Google Scholar
- Spark Streaming. http://spark.apache.org/streaming/.Google Scholar
- S. Agrawal, W. Thies, and S. Amarasinghe. Optimizing stream programs using linear state space analysis. In Proceedings of the 2005 International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES’05), pages 126– 136. ACM, 2005. Google Scholar
Digital Library
- A. Alexandrov, R. Bergmann, S. Ewen, J.-C. Freytag, F. Hueske, A. Heise, O. Kao, M. Leich, U. Leser, V. Markl, F. Naumann, M. Peters, A. Rheinländer, M. J. Sax, S. Schelter, M. Höger, K. Tzoumas, and D. Warneke. The Stratosphere platform for big data analytics. The VLDB Journal, 23(6): 939–964, Dec. 2014. Google Scholar
Digital Library
- R. Alur and J. V. Deshmukh. Nondeterministic streaming string transducers. In Proceedings of Automata, Languages and Programming: 38th International Colloquium (ICALP 2011), volume 6756 of LNCS, pages 1–20. Springer, 2011. Google Scholar
Digital Library
- R. Alur and P. ˇCerný. Streaming transducers for algorithmic verification of single-pass list-processing programs. In Proceedings of the 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’11), pages 599–610. ACM, 2011. Google Scholar
Digital Library
- R. Alur, A. Freilich, and M. Raghothaman. Regular combinators for string transformations. In Proceedings of the Joint Meeting of the Twenty-Third EACSL Annual Conference on Computer Science Logic (CSL) and the Twenty-Ninth Annual ACM/IEEE Symposium on Logic in Computer Science (LICS), pages 9:1–9:10. ACM, 2014. Google Scholar
Digital Library
- R. Alur, L. D’Antoni, and M. Raghothaman. DReX: A declarative language for efficiently evaluating regular string transformations. In Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’15), pages 125–137. ACM, 2015. Google Scholar
Digital Library
- R. Alur and P. ˇCerný. Expressiveness of streaming string transducers. In IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2010), volume 8 of Leibniz International Proceedings in Informatics (LIPIcs), pages 1–12, Dagstuhl, Germany, 2010. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik.Google Scholar
- D. Calvanese, G. Giacomo, M. Lenzerini, and M. Y. Vardi. An automata-theoretic approach to regular XPath. In Proceedings of the 12th International Symposium on Database Programming Languages (DBPL’09), volume 5708 of LNCS, pages 18–35. Springer, 2009. Google Scholar
Digital Library
- D. Coutts, R. Leshchinskiy, and D. Stewart. Stream fusion: From lists to streams to nothing at all. In Proceedings of the 12th ACM SIGPLAN International Conference on Functional Programming (ICFP’07), pages 315–326. ACM, 2007. Google Scholar
Digital Library
- L. D’Antoni and M. Veanes. Extended symbolic finite automata and transducers. Formal Methods in System Design, 47 (1):93–119, Aug. 2015. Google Scholar
Digital Library
- L. De Moura and N. Bjørner. Z3: An efficient SMT solver. In Proceedings of the 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS’08), volume 4963 of LNCS, pages 337–340. Springer, 2008. Google Scholar
Digital Library
- J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. Communications of the ACM, 51(1):107–113, Jan. 2008. Google Scholar
Digital Library
- D. Debarbieux, O. Gauwin, J. Niehren, T. Sebastian, and M. Zergaoui. Early nested word automata for XPath query answering on XML streams. Theoretical Computer Science, 578:100–125, May 2015. Google Scholar
Digital Library
- J. Engelfriet and H. J. Hoogeboom. MSO definable string transductions and two-way finite-state transducers. ACM Transactions on Computational Logic, 2(2):216–254, Apr. 2001. Google Scholar
Digital Library
- P. Fradet and S. H. T. Ha. Network fusion. In Proceedings of Programming Languages and Systems: Second Asian Symposium (APLAS’04), volume 3302 of LNCS, pages 21–40. Springer, 2004.Google Scholar
- M. I. Gordon, W. Thies, M. Karczmarek, J. Lin, A. S. Meli, A. A. Lamb, C. Leger, J. Wong, H. Hoffmann, D. Maze, and S. Amarasinghe. A stream compiler for communicationexposed architectures. In Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS X), pages 291–303. ACM, 2002. Google Scholar
Digital Library
- B. B. Grathwohl, F. Henglein, U. T. Rasmussen, K. A. Søholm, and S. P. Tørholm. Kleenex: Compiling nondeterministic transducers to deterministic streaming transducers. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’16), pages 284– 297. ACM, 2016. Google Scholar
Digital Library
- M. Hirzel, R. Soulé, S. Schneider, B. Gedik, and R. Grimm. A catalog of stream processing optimizations. ACM Computing Surveys, 46(4):46:1–46:34, Mar. 2014. Google Scholar
Digital Library
- J. E. Hopcroft and J. D. Ullman. Introduction to Automata Theory, Languages, and Computation. Addison-Wesley, 1979. ISBN 0321455363. Google Scholar
Digital Library
- M. Hyland, G. D. Plotkin, and J. Power. Combining effects: Sum and tensor. Theoretical Computer Science, 357(1-3):70– 99, July 2006. Google Scholar
Digital Library
- G. Mainland, R. Leshchinskiy, and S. Peyton Jones. Exploiting vector instructions with generalized stream fusion. In Proceedings of the 18th ACM SIGPLAN International Conference on Functional Programming (ICFP’13), pages 37–48. ACM, 2013. Google Scholar
Digital Library
- A. Maletti, J. Graehl, M. Hopkins, and K. Knight. The power of extended top-down tree transducers. SIAM Journal on Computing, 39(2):410–430, June 2009. Google Scholar
Digital Library
- E. Meijer, B. Beckman, and G. Bierman. LINQ: Reconciling object, relations and XML in the .NET framework. In Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data (SIGMOD’06), pages 706–706. ACM, 2006. Google Scholar
Digital Library
- T. Milo, D. Suciu, and V. Vianu. Typechecking for XML transformers. In Proceedings of the Nineteenth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS’00), pages 11–22. ACM, 2000. Google Scholar
Digital Library
- E. Moggi. Notions of computation and monads. Information and Computation, 93(1):55–92, July 1991. Google Scholar
Digital Library
- B. Mozafari, K. Zeng, L. D’antoni, and C. Zaniolo. Highperformance complex event processing over hierarchical data. ACM Transactions on Database Systems, 38(4):21:1–21:39, Dec. 2013. Google Scholar
Digital Library
- D. G. Murray, M. Isard, and Y. Yu. Steno: Automatic optimization of declarative queries. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’11), pages 121–131. ACM, 2011. Google Scholar
Digital Library
- M. Poess, T. Rabl, H.-A. Jacobsen, and B. Caufield. TPC-DI: The first industry benchmark for data integration. Proceedings of the VLDB Endowment, 7(13):1367–1378, Aug. 2014. Google Scholar
Digital Library
- T. A. Proebsting and S. A. Watterson. Filter fusion. In Proceedings of the 23rd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’96), pages 119–130. ACM, 1996. Google Scholar
Digital Library
- Y. Sakuma, Y. Minamide, and A. Voronkov. Translating regular expression matching into transducers. Journal of Applied Logic, 10(1):32–51, Mar. 2012. Google Scholar
Digital Library
- S. Schneider, M. Hirzel, B. Gedik, and K.-L. Wu. Autoparallelizing stateful distributed streaming applications. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques (PACT’12), pages 53–64. ACM, 2012. Google Scholar
Digital Library
- J. Sermulins, W. Thies, R. Rabbah, and S. Amarasinghe. Cache aware optimization of stream programs. In Proceedings of the 2005 ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES’05), pages 115–126. ACM, 2005. Google Scholar
Digital Library
- O. Shivers and M. Might. Continuations and transducer composition. In Proceedings of the 27th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’06), pages 295–307. ACM, 2006. Google Scholar
Digital Library
- J. H. Spring, J. Privat, R. Guerraoui, and J. Vitek. Stream-Flex: High-throughput stream programming in Java. In Proceedings of the 22nd Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems and Applications (OOPSLA’07), pages 211–228. ACM, 2007. Google Scholar
Digital Library
- W. Thies, M. Karczmarek, and S. P. Amarasinghe. StreamIt: A language for streaming applications. In Proceedings of the 11th International Conference on Compiler Construction (CC’02), volume 2304 of LNCS, pages 179–196. Springer, 2002. Google Scholar
Digital Library
- M. Veanes, P. Hooimeijer, B. Livshits, D. Molnar, and N. Bjorner. Symbolic finite state transducers: Algorithms and applications. In Proceedings of the 39th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’12), pages 137–150. ACM, 2012. Google Scholar
Digital Library
- M. Veanes, T. Mytkowicz, D. Molnar, and B. Livshits. Dataparallel string-manipulating programs. In Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’15), pages 139– 152. ACM, 2015. Google Scholar
Digital Library
- P. Wadler. Deforestation: Transforming programs to eliminate trees. Theoretical Computer Science, 73(2):231–248, Jan. 1988. Google Scholar
Digital Library
- P. Wadler. Comprehending monads. In Proceedings of the 1990 ACM Conference on LISP and Functional Programming (LFP’90), pages 61–78. ACM, 1990. Google Scholar
Digital Library
- M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: Cluster computing with working sets. In Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing (HotCloud’10), pages 10–10. USENIX Association, 2010. Google Scholar
Digital Library
Index Terms
Fusing effectful comprehensions
Recommendations
Fusing effectful comprehensions
PLDI 2017: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and ImplementationList comprehensions provide a powerful abstraction mechanism for expressing computations over ordered collections of data declaratively without having to use explicit iteration constructs. This paper puts forth effectful comprehensions as an elegant way ...
Combining loop transformations considering caches and scheduling
MICRO 29: Proceedings of the 29th annual ACM/IEEE international symposium on MicroarchitectureThe performance of modern microprocessors is greatly affected by cache behavior, instruction scheduling, register allocation and loop overhead. High level loop transformations such as fission, fusion, tiling, interchanging and outer loop unrolling (e.g.,...
Semi-tensor product of matrices approach to reachability of finite automata with application to language recognition
This paper investigates the transition function and the reachability conditions of finite automata by using a semi-tensor product of matrices, which is a new powerful matrix analysis tool. The states and input symbols are first expressed in vector forms,...






Comments