skip to main content

Eliminating abstraction overhead of Java stream pipelines using ahead-of-time program optimization

Published:13 November 2020Publication History
Skip Abstract Section

Abstract

Java 8 introduced streams that allow developers to work with collections of data using functional-style operations. Streams are often used in pipelines of operations for processing the data elements, which leads to concise and elegant program code. However, the declarative data processing style comes at a cost. Compared to processing the data with traditional imperative language mechanisms, constructing stream pipelines requires extra heap objects and virtual method calls, which often results in significant run-time overheads.

In this work we investigate how to mitigate these overheads to enable processing data in the declarative style without sacrificing performance. We argue that ahead-of-time bytecode-to-bytecode transformation is a suitable approach to optimization of stream pipelines, and we present a static analysis that is designed to guide such transformations. Experimental results show a significant performance gain, and that the technique works for realistic stream pipelines. For 10 of 11 micro-benchmarks, the optimizer is able to produce bytecode that is as effective as hand-written imperative-style code. Additionally, 77% of 6879 stream pipelines found in real-world Java programs are optimized successfully.

Skip Supplemental Material Section

Supplemental Material

Auxiliary Presentation Video

Java 8 introduced streams that allow developers to work with collections of data using functional-style operations. Compared to processing the data with traditional imperative language mechanisms, stream pipelines requires extra heap objects and virtual method calls, which results in significant run-time overheads. We investigate how to mitigate these overheads to enable processing data in the declarative style without sacrificing performance. We argue that ahead-of-time transformation is a suitable approach to optimization of stream pipelines, and we present a static analysis that is designed to guide such transformations. Experimental results show a significant performance gain, and that the technique works for realistic stream pipelines. For 10 of 11 micro-benchmarks, the optimizer is able to produce bytecode that is as effective as hand-written imperative-style code. Additionally, 77% of 6879 stream pipelines found in real-world Java programs are optimized successfully.

References

  1. Radoslaw Adamus, Tomasz Marek Kowalski, and Jacek Wislicki. 2015. A step towards genuine declarative languageintegrated queries. In 2015 Federated Conference on Computer Science and Information Systems, FedCSIS 2015, Lódz, Poland, September 13-16, 2015, Vol. 5. IEEE, 935-946. https://doi.org/10.15439/2015F156 Google ScholarGoogle ScholarCross RefCross Ref
  2. Ole Agesen. 1995. The Cartesian Product Algorithm: Simple and Precise Type Inference Of Parametric Polymorphism. In ECOOP' 95-Object-Oriented Programming, 9th European Conference, Århus, Denmark, August 7-11, 1995, Proceedings (Lecture Notes in Computer Science), Vol. 952. Springer, 2-26. https://doi.org/10.1007/3-540-49538-X_2 Google ScholarGoogle ScholarCross RefCross Ref
  3. Matthew Arnold, Stephen J. Fink, David Grove, Michael Hind, and Peter F. Sweeney. 2005. A Survey of Adaptive Optimization in Virtual Machines. Proc. IEEE 93, 2 ( 2005 ), 449-466. https://doi.org/10.1109/JPROC. 2004.840305 Google ScholarGoogle ScholarCross RefCross Ref
  4. Matthew Arnold, Stephen J. Fink, Vivek Sarkar, and Peter F. Sweeney. 2000. A comparative study of static and profile-based heuristics for inlining. In Proceedings of ACM SIGPLAN Workshop on Dynamic and Adaptive Compilation and Optimization (Dynamo 2000 ), Boston, MA, USA, January 18, 2000. ACM, 52-64. https://doi.org/10.1145/351397.351416 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. John Aycock. 2003. A brief history of just-in-time. ACM Comput. Surv. 35, 2 ( 2003 ), 97-113. https://doi.org/10.1145/857076. 857077 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Aggelos Biboudis, Nick Palladinos, George Fourtounis, and Yannis Smaragdakis. 2015. Streams a la carte: Extensible Pipelines with Object Algebras. In 29th European Conference on Object-Oriented Programming, ECOOP 2015, July 5-10, 2015, Prague, Czech Republic (LIPIcs), Vol. 37. Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 591-613. https: //doi.org/10.4230/LIPIcs.ECOOP. 2015.591 Google ScholarGoogle ScholarCross RefCross Ref
  7. Aggelos Biboudis, Nick Palladinos, and Yannis Smaragdakis. 2014. Clash of the Lambdas. CoRR abs/1406.6631 ( 2014 ). arXiv: 1406. 6631Google ScholarGoogle Scholar
  8. Martin Bravenboer and Yannis Smaragdakis. 2009. Strictly declarative specification of sophisticated points-to analyses. In Proceedings of the 24th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2009, October 25-29, 2009, Orlando, Florida, USA. ACM, 243-262. https://doi.org/10.1145/1640089. 1640108 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Zoran Budimlic and Ken Kennedy. 1997. Optimizing Java: theory and practice. Concurrency-Practice and Experience 9, 6 ( 1997 ), 445-463.Google ScholarGoogle Scholar
  10. Zoran Budimlic and Ken Kennedy. 1998. Static interprocedural optimizations in Java. Technical Report. Center for Research on Parallel Computation, Rice University, Technical Report CRPC-TR98746.Google ScholarGoogle Scholar
  11. David Callahan, Keith D. Cooper, Ken Kennedy, and Linda Torczon. 1986. Interprocedural constant propagation. In Proceedings of the 1986 SIGPLAN Symposium on Compiler Construction, Palo Alto, California, USA, June 25-27, 1986. ACM, 152-161. https://doi.org/10.1145/12276.13327 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. David R. Chase, Mark N. Wegman, and F. Kenneth Zadeck. 1990. Analysis of Pointers and Structures. In Proceedings of the ACM SIGPLAN'90 Conference on Programming Language Design and Implementation (PLDI), White Plains, New York, USA, June 20-22, 1990. ACM, 296-310. https://doi.org/10.1145/93542.93585 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Jong-Deok Choi, Manish Gupta, Mauricio J. Serrano, Vugranam C. Sreedhar, and Samuel P. Midkif. 1999. Escape Analysis for Java. In Proceedings of the 1999 ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages & Applications (OOPSLA '99), Denver, Colorado, USA, November 1-5, 1999. ACM, 1-19. https://doi.org/10.1145/320384.320386 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Jefrey Dean, David Grove, and Craig Chambers. 1995. Optimization of Object-Oriented Programs Using Static Class Hierarchy Analysis. In ECOOP' 95-Object-Oriented Programming, 9th European Conference, Århus, Denmark, August 7-11, 1995, Proceedings (Lecture Notes in Computer Science), Vol. 952. Springer, 77-101. https://doi.org/10.1007/3-540-49538-X_5 Google ScholarGoogle ScholarCross RefCross Ref
  15. David Detlefs and Ole Agesen. 1999. Inlining of Virtual Methods. In ECOOP' 99-Object-Oriented Programming, 13th European Conference, Lisbon, Portugal, June 14-18, 1999, Proceedings (Lecture Notes in Computer Science), Vol. 1628. Springer, 258-278. https://doi.org/10.1007/3-540-48743-3_12 Google ScholarGoogle ScholarCross RefCross Ref
  16. Julian Dolby and Andrew A. Chien. 1998. An Evaluation of Automatic Object Inline Allocation Techniques. In Proceedings of the 1998 ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages & Applications (OOPSLA '98), Vancouver, British Columbia, Canada, October 18-22, 1998. ACM, 1-20. https://doi.org/10.1145/286936.286943 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Julian Dolby, Stephen J. Fink, and Manu Sridharan. 2010. T.J. Watson Libraries for Analysis. http://wala.sourceforge.net/Google ScholarGoogle Scholar
  18. Andy Georges, Dries Buytaert, and Lieven Eeckhout. 2007. Statistically rigorous Java performance evaluation. In Proceedings of the 22nd Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2007, October 21-25, 2007, Montreal, Quebec, Canada. ACM, 57-76. https://doi.org/10.1145/1297027.1297033 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Manohar Jonnalagedda and Sandro Stucki. 2015. Fold-based fusion as a library: a generative programming pearl. In Proceedings of the 6th ACM SIGPLAN Symposium on Scala, [email protected] 2015, Portland, OR, USA, June 15-17, 2015. ACM, 41-50. https://doi.org/10.1145/2774975.2774981 Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. John B. Kam and Jefrey D. Ullman. 1977. Monotone Data Flow Analysis Frameworks. Acta Inf. 7 ( 1977 ), 305-317. https://doi.org/10.1007/BF00290339 Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Rafi Khatchadourian, Yiming Tang, and Mehdi Bagherzadeh. 2020a. Safe Automated Refactoring for Intelligent Parallelization of Java 8 Streams. Science of Computer Programming ( 2020 ), 102476. https://doi.org/10.1016/j.scico. 2020.102476 Google ScholarGoogle ScholarCross RefCross Ref
  22. Rafi Khatchadourian, Yiming Tang, Mehdi Bagherzadeh, and Baishakhi Ray. 2020b. An Empirical Study on the Use and Misuse of Java 8 Streams. In Fundamental Approaches to Software Engineering-23rd International Conference, FASE 2020, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2020, Dublin, Ireland, April 25-30, 2020, Proceedings (Lecture Notes in Computer Science), Vol. 12076. Springer, 97-118. https://doi.org/10.1007/978-3-030-45234-6_5 Google ScholarGoogle ScholarCross RefCross Ref
  23. Oleg Kiselyov, Aggelos Biboudis, Nick Palladinos, and Yannis Smaragdakis. 2017. Stream fusion, to completeness. In Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages, POPL 2017, Paris, France, January 18-20, 2017. ACM, 285-299. https://doi.org/10.1145/3093333.3009880 Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Ondrej Lhoták and Laurie J. Hendren. 2003. Scaling Java Points-to Analysis Using SPARK. In Compiler Construction, 12th International Conference, CC 2003, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2003, Warsaw, Poland, April 7-11, 2003, Proceedings (Lecture Notes in Computer Science), Vol. 2622. Springer, 153-169. https://doi.org/10.1007/3-540-36579-6_12 Google ScholarGoogle Scholar
  25. Davood Mazinanian, Ameya Ketkar, Nikolaos Tsantalis, and Danny Dig. 2017. Understanding the use of lambda expressions in Java. PACMPL 1, OOPSLA ( 2017 ), 85 : 1-85 : 31. https://doi.org/10.1145/3133909 Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Nuthan Munaiah, Steven Kroh, Craig Cabrey, and Meiyappan Nagappan. 2017. Curating GitHub for engineered software projects. Empirical Software Engineering 22, 6 ( 2017 ), 3219-3253. https://doi.org/10.1007/s10664-017-9512-6 Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Derek Gordon Murray, Michael Isard, and Yuan Yu. 2011. Steno: automatic optimization of declarative queries. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2011, San Jose, CA, USA, June 4-8, 2011. ACM, 121-131. https://doi.org/10.1145/1993498.1993513 Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Erik M. Nystrom, Hong-Seok Kim, and Wen-mei W. Hwu. 2004. Importance of heap specialization in pointer analysis. In Proceedings of the 2004 ACM SIGPLAN-SIGSOFT Workshop on Program Analysis For Software Tools and Engineering, PASTE'04, Washington, DC, USA, June 7-8, 2004. ACM, 43-48. https://doi.org/10.1145/996821.996836 Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Oracle. 2014a. Java Microbenchmarking Harness. http://openjdk.java.net/projects/code-tools/jmh/Google ScholarGoogle Scholar
  30. Oracle. 2014b. java.util.stream documentation for JDK 8. https://docs.oracle.com/javase/8/docs/api/java/util/stream/packagesummary.htmlGoogle ScholarGoogle Scholar
  31. Oracle. 2014c. JDK 8. https://openjdk.java.net/projects/jdk8/Google ScholarGoogle Scholar
  32. Nick Palladinos and Kostas Rontogiannis. 2014. LinqOptimizer: An automatic query optimizer for LINQ to Objects and PLINQ. http://nessos.github.io/LinqOptimizer/Google ScholarGoogle Scholar
  33. Young Gil Park and Benjamin Goldberg. 1992. Escape Analysis on Lists. In Proceedings of the ACM SIGPLAN'92 Conference on Programming Language Design and Implementation (PLDI), San Francisco, California, USA, June 17-19, 1992. ACM, 116-127. https://doi.org/10.1145/143095.143125 Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Aleksandar Prokopec, David Leopoldseder, Gilles Duboscq, and Thomas Würthinger. 2017. Making collection operations optimal with aggressive JIT compilation. In Proceedings of the 8th ACM SIGPLAN International Symposium on Scala, [email protected] 2017, Vancouver, BC, Canada, October 22-23, 2017. ACM, 29-40. https://doi.org/10.1145/3136000.3136002 Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Aleksandar Prokopec and Dmitry Petrashko. 2013. ScalaBlitz: Lightning-fast Scala collections framework. https://scalablitz.github.io/Google ScholarGoogle Scholar
  36. John Rose. 2015. Hotspot-dev mailing list: Perspectives on Streams Performance. http://mail.openjdk.java.net/pipermail/ hotspot-compiler-dev/2015-March/017278.htmlGoogle ScholarGoogle Scholar
  37. Ulrik Pagh Schultz, Julia L. Lawall, and Charles Consel. 2003. Automatic program specialization for Java. ACM Trans. Program. Lang. Syst. 25, 4 ( 2003 ), 452-499. https://doi.org/10.1145/778559.778561 Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Denys Shabalin and Martin Odersky. 2018. Interflow: interprocedural flow-sensitive type inference and method duplication. In Proceedings of the 9th ACM SIGPLAN International Symposium on Scala, [email protected] 2018, St. Louis, MO, USA, September 28, 2018. ACM, 61-71. https://doi.org/10.1145/3241653.3241660 Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Micha Sharir and Amir Pnueli. 1981. Two approaches to interprocedural data flow analysis. Prentice-Hall, Chapter 7, 189-234.Google ScholarGoogle Scholar
  40. Yannis Smaragdakis, Martin Bravenboer, and Ondrej Lhoták. 2011. Pick your contexts well: understanding object-sensitivity. In Proceedings of the 38th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2011, Austin, TX, USA, January 26-28, 2011. ACM, 17-30. https://doi.org/10.1145/1926385.1926390 Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Johannes Späth, Lisa Nguyen Quang Do, Karim Ali, and Eric Bodden. 2016. Boomerang: Demand-Driven Flow-and Context-Sensitive Pointer Analysis for Java. In 30th European Conference on Object-Oriented Programming, ECOOP 2016, July 18-22, 2016, Rome, Italy (LIPIcs), Vol. 56. Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 22 : 1-22 : 26. https://doi.org/10.4230/LIPIcs.ECOOP. 2016.22 Google ScholarGoogle ScholarCross RefCross Ref
  42. Manu Sridharan and Rastislav Bodík. 2006. Refinement-based context-sensitive points-to analysis for Java. In Proceedings of the ACM SIGPLAN 2006 Conference on Programming Language Design and Implementation, Ottawa, Ontario, Canada, June 11-14, 2006. ACM, 387-400. https://doi.org/10.1145/1133981.1134027 Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Philip Wadler. 1990. Deforestation: Transforming Programs to Eliminate Trees. Theor. Comput. Sci. 73, 2 ( 1990 ), 231-248. https://doi.org/10.1016/ 0304-3975 ( 90 ) 90147-A Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Richard C. Waters. 1991. Automatic Transformation of Series Expressions into Loops. ACM Trans. Program. Lang. Syst. 13, 1 ( 1991 ), 52-98. https://doi.org/10.1145/114005.102806 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Eliminating abstraction overhead of Java stream pipelines using ahead-of-time program optimization

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!