skip to main content
research-article

Vectorization of apply to reduce interpretation overhead of R

Published:23 October 2015Publication History
Skip Abstract Section

Abstract

R is a popular dynamic language designed for statistical computing. Despite R's huge user base, the inefficiency in R's language implementation becomes a major pain-point in everyday use as well as an obstacle to apply R to solve large scale analytics problems. The two most common approaches to improve the performance of dynamic languages are: implementing more efficient interpretation strategies and extending the interpreter with Just-In-Time (JIT) compiler. However, both approaches require significant changes to the interpreter, and complicate the adoption by development teams as a result. This paper presents a new approach to improve execution efficiency of R programs by vectorizing the widely used Apply class of operations. Apply accepts two parameters: a function and a collection of input data elements. The standard implementation of Apply iteratively invokes the input function with each element in the data collection. Our approach combines data transformation and function vectorization to convert the looping-over-data execution of the standard Apply into a single invocation of a vectorized function that contains a sequence of vector operations over the input data. This conversion can significantly speed-up the execution of Apply operations in R by reducing the number of interpretation steps. We implemented the vectorization transformation as an R package. To enable the optimization, all that is needed is to invoke the package, and the user can use a normal R interpreter without any changes. The evaluation shows that the proposed method delivers significant performance improvements for a collection of data analysis algorithm benchmarks. This is achieved without any native code generation and using only a single-thread of execution.

Skip Supplemental Material Section

Supplemental Material

References

  1. The computer language benchmarks game (CLBG), 2013. http://benchmarksgame.alioth.debian.org/.Google ScholarGoogle Scholar
  2. Numpy, 2013. http://www.numpy.org/.Google ScholarGoogle Scholar
  3. Keith Adams, Jason Evans, Bertrand Maher, Guilherme Ottoni, Andrew Paroski, Brett Simmers, Edwin Smith, and Owen Yamauchi. The hiphop virtual machine. In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages and Applications, OOPSLA ’14, pages 777–790, New York, NY, USA, 2014. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. R. Allen, Ken Kennedy, Carrie Porterfield, and Joe Warren. Conversion of control dependence to data dependence. In Proceedings of the 10th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages, POPL ’83, pages 177–189, New York, NY, USA, 1983. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. George Almási and David Padua. MaJIC: compiling MATLAB for speed and responsiveness. In Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation, PLDI’02, pages 294–303, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Revolution Analytics. Revolution R enterprise, 2013. http://www.revolutionanalytics.com/products/revolutionenterprise.php.Google ScholarGoogle Scholar
  7. Revolution Analytics. Revolution r enterprise scaler - transparent parallelism accelerates big data analytics easily, 2015. http://www.revolutionanalytics.com/revolution-renterprise-scaler.Google ScholarGoogle Scholar
  8. Alexander Bertram. Renjin: The R programming language on the JVM, 2013. http://www.renjin.org/.Google ScholarGoogle Scholar
  9. Paul Biggar, Edsko de Vries, and David Gregg. A practical solution for scripting language compilers. In Proceedings of the 2009 ACM Symposium on Applied Computing, SAC ’09, pages 1916–1923, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Jose Castanos, David Edelsohn, Kazuaki Ishizaki, Priya Nagpurkar, Toshio Nakatani, Takeshi Ogasawara, and Peng Wu. On the benefits and pitfalls of extending a statically typed language JIT compiler for dynamic scripting languages. In Proceedings of the ACM international conference on Object oriented programming systems languages and applications, OOPSLA’12, pages 195–212. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. C. Chambers and D. Ungar. Customization: optimizing compiler technology for SELF, a dynamically-typed objectoriented programming language. In Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation, PLDI’89, pages 146–160, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C. Chambers, D. Ungar, and E. Lee. An efficient implementation of SELF a dynamically-typed object-oriented language based on prototypes. In Conference proceedings on Objectoriented programming systems, languages and applications, OOPSLA’89, pages 49–70, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Cheng Chu, Sang Kyun Kim, Yi-An Lin, YuanYuan Yu, Gary Bradski, Andrew Y Ng, and Kunle Olukotun. Map-reduce for machine learning on multicore. Advances in neural information processing systems, 19:281–290, 2007.Google ScholarGoogle Scholar
  14. Luiz De Rose and David Padua. Techniques for the translation of MATLAB programs into Fortran 90. ACM Trans. Program. Lang. Syst., 21(2):286–323, March 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Dirk Eddelbuettel. High-performance and parallel computing with r, 2015. http://cran.rproject.org/web/views/HighPerformanceComputing.html.Google ScholarGoogle Scholar
  16. Andreas Gal, Brendan Eich, Mike Shaver, David Anderson, David Mandelin, Mohammad R. Haghighat, Blake Kaplan, Graydon Hoare, Boris Zbarsky, Jason Orendorff, Jesse Ruderman, Edwin W. Smith, Rick Reitmaier, Michael Bebenita, Mason Chang, and Michael Franz. Trace-based just-in-time type specialization for dynamic languages. In Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation, PLDI’09, pages 465–478. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Brian Hackett and Shu-yu Guo. Fast and precise hybrid type inference for JavaScript. In Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation, PLDI’12, pages 239–250, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Sven Hager. Implementing the R language using RPython. Master’s thesis, Institut Für InformaAtik. Softwaretechnik und Programmiersprachen. Düsseldorft Universitätsstr, 2012.Google ScholarGoogle Scholar
  19. Stephan Herhut, Richard L. Hudson, Tatiana Shpeisman, and Jaswanth Sreeram. River trail: A path to parallelism in javascript. In Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages and Applications, OOPSLA ’13, pages 729– 744, New York, NY, USA, 2013. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Tomas Kalibera, Petr Maj, Floreal Morandat, and Jan Vitek. A fast abstract syntax tree interpreter for r. In Proceedings of the 10th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, VEE ’14, pages 89–102, New York, NY, USA, 2014. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Ralf Karrenberg and Sebastian Hack. Whole-function vectorization. In Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO ’11, pages 141–150, Washington, DC, USA, 2011. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Jochen Knaus. snowfall: Easier cluster computing, 2013. http://cran.r-project.org/web/packages/snowfall/index.html.Google ScholarGoogle Scholar
  23. Hao Lin, Shuo Yang, and S.P. Midkiff. Rabid – a general distributed r processing framework targeting large data-set problems. In Big Data (BigData Congress), 2013 IEEE International Congress on, pages 423–424, June 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Mian Lu, Lei Zhang, Huynh Phung Huynh, Zhongliang Ong, Yun Liang, Bingsheng He, R.S.M. Goh, and R. Huynh. Optimizing the mapreduce framework on intel xeon phi coprocessor. In Big Data, 2013 IEEE International Conference on, pages 125–130, Oct 2013.Google ScholarGoogle ScholarCross RefCross Ref
  25. S. Maleki, Yaoqing Gao, M.J. Garzaran, T. Wong, and D.A. Padua. An evaluation of vectorizing compilers. In Parallel Architectures and Compilation Techniques (PACT), 2011 International Conference on, pages 372–382, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Andrew McAfee and Erik Brynjolfsson. Big data: The management revolution. Harvard Business Review, October 2012.Google ScholarGoogle Scholar
  27. Vijay Menon and Keshav Pingali. A case for source-level transformations in MATLAB. In Proceedings of the 2nd conference on Domain-specific languages, DSL’99, pages 53– 65, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Floréal Morandat, Brandon Hill, Leo Osvald, and Jan Vitek. Evaluating the design of the r language: objects and functions for data analysis. In Proceedings of the 26th European conference on Object-Oriented Programming, ECOOP’12, pages 104–131, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Radford M. Neal. pqR - a pretty quick version of R, 2013. http://radfordneal.github.io/pqR/.Google ScholarGoogle Scholar
  30. Karl J. Ottenstein, Robert A. Ballance, and Arthur B. Mac-Cabe. The program dependence web: A representation supporting control-, data-, and demand-driven interpretation of imperative languages. In Proceedings of the ACM SIGPLAN 1990 Conference on Programming Language Design and Implementation, PLDI ’90, pages 257–271, New York, NY, USA, 1990. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Mike Pall. The LuaJIT project, 2013. http://luajit.org/.Google ScholarGoogle Scholar
  32. M. Pharr and W.R. Mark. ISPC: A SPMD compiler for high-performance CPU programming. In Innovative Parallel Computing (InPar), 2012, pages 1–13, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  33. Armin Rigo and Samuele Pedroni. PyPy’s approach to virtual machine construction. In Companion to the 21st ACM SIGPLAN symposium on Object-oriented programming systems, languages, and applications, OOPSLA’06, pages 944– 953, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Henrique Nazare Santos, Pericles Alves, Igor Costa, and Fernando Magno Quintao Pereira. Just-in-time value specialization. In Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), CGO’13, pages 1–11. IEEE Computer Society, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Justin Talbot, Zachary DeVito, and Pat Hanrahan. Riposte: a trace-driven compiler and parallel VM for vector code in R. In Proceedings of the 21st international conference on Parallel architectures and compilation techniques, PACT’12, pages 43–52, New York, NY, USA, 2012. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Justin Talbot, Zachary DeVito, and Pat Hanrahan. Just-in-time length specialization of dynamic vector code. In Proceedings of ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming, ARRAY’14, pages 20:20–20:25, New York, NY, USA, 2014. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Luke Tierney. Compiling R: A preliminary report. In Proceedings of the 2nd International Workshop on Distributed Statistical Computing, DSC2001, March 2001.Google ScholarGoogle Scholar
  38. Luke Tierney, A. J. Rossini, and Na Li. Snow: A parallel computing framework for the r system. Int. J. Parallel Program., 37(1):78–90, February 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Simon Urbanek. Multicore: Parallel processing of r code on machines with multiple cores or cpus, 2009.Google ScholarGoogle Scholar
  40. https://www.rforge.net/multicore/.Google ScholarGoogle Scholar
  41. Ashlee Vance. Data analysts captivated by R’s power. New York Times, January 2009.Google ScholarGoogle Scholar
  42. Shivaram Venkataraman. Large scale data analysis made easier with sparkr, 2014.Google ScholarGoogle Scholar
  43. https://amplab.cs.berkeley.edu/2014/01/26/large-scaledata-analysis-made-easier-with-sparkr/.Google ScholarGoogle Scholar
  44. Haichuan Wang, Peng Wu, and David Padua. Optimizing r vm: Allocation removal and path length reduction via interpreter-level specialization. In Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO ’14, pages 295:295–295:305, New York, NY, USA, 2014. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Steve Weston. foreach: Foreach looping construct for r, 2014. https://cran.r-project.org/web/packages/foreach/.Google ScholarGoogle Scholar
  46. Kevin Williams, Jason McCandless, and David Gregg. Dynamic interpretation for dynamic scripting languages. In Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization, CGO’10, pages 278–287, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Thomas Würthinger, Christian Wimmer, Andreas Wöß, Lukas Stadler, Gilles Duboscq, Christian Humer, Gregor Richards, Doug Simon, and Mario Wolczko. One vm to rule them all. In Proceedings of the 2013 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming & Software, Onward! ’’13, pages 187–204, New York, NY, USA, 2013. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Haiping Zhao, Iain Proctor, Minghui Yang, Xin Qi, Mark Williams, Qi Gao, Guilherme Ottoni, Andrew Paroski, Scott MacVicar, Jason Evans, and Stephen Tu. The HipHop compiler for PHP. In Proceedings of the ACM international conference on Object oriented programming systems languages and applications, OOPSLA’12, pages 575–586, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Vectorization of apply to reduce interpretation overhead of R

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 50, Issue 10
      OOPSLA '15
      October 2015
      953 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/2858965
      • Editor:
      • Andy Gill
      Issue’s Table of Contents
      • cover image ACM Conferences
        OOPSLA 2015: Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications
        October 2015
        953 pages
        ISBN:9781450336895
        DOI:10.1145/2814270

      Copyright © 2015 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 23 October 2015

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!