skip to main content

Designing types for R, empirically

Published:13 November 2020Publication History
Skip Abstract Section

Abstract

The R programming language is widely used in a variety of domains. It was designed to favor an interactive style of programming with minimal syntactic and conceptual overhead. This design is well suited to data analysis, but a bad fit for tools such as compilers or program analyzers. In particular, R has no type annotations, and all operations are dynamically checked at run-time. The starting point for our work are the two questions: what expressive power is needed to accurately type R code? and which type system is the R community willing to adopt? Both questions are difficult to answer without actually experimenting with a type system. The goal of this paper is to provide data that can feed into that design process. To this end, we perform a large corpus analysis to gain insights in the degree of polymorphism exhibited by idiomatic R code and explore potential benefits that the R community could accrue from a simple type system. As a starting point, we infer type signatures for 25,215 functions from 412 packages among the most widely used open source R libraries. We then conduct an evaluation on 8,694 clients of these packages, as well as on end-user code from the Kaggle data science competition website.

Skip Supplemental Material Section

Supplemental Material

Auxiliary Presentation Video

This is a video presentation of the paper "Designing Types for R, Empirically", which appears at OOPSLA'20. In our paper, we propose a type annotation framework for R functions. We also undertake a large empirical study, collecting a vast amount of data on how R programmers use the language’s rich dynamic types, querying this data to help validate our type language design. This video presentation will show a cross section of our work: we will detail the design and evaluation process for R’s vectorized primitive types, shedding light on our approach to retrofitting a type checking framework onto a dynamic language.

References

  1. Miltiadis Allamanis, Earl T. Barr, Soline Ducousso, and Zheng Gao. 2020. Typilus: neural type hints. In Conference on Programming Language Design and Implementation (PLDI). https://doi.org/10.1145/3385412.3385997 Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Jong-hoon (David) An, Avik Chaudhuri, Jefrey S. Foster, and Michael Hicks. 2011. Dynamic Inference of Static Types for Ruby. In Proceedings of the 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. https://doi.org/10.1145/1926385.1926437 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Esben Andreasen, Colin S. Gordon, Satish Chandra, Manu Sridharan, Frank Tip, and Koushik Sen. 2016. Trace Typing: An Approach for Evaluating Retrofitted Type Systems. In European Conference on Object-Oriented Programming (ECOOP). https://doi.org/10.4230/LIPIcs.ECOOP. 2016.1 Google ScholarGoogle ScholarCross RefCross Ref
  4. Richard A. Becker, John M. Chambers, and Allan R. Wilks. 1988. The New S Language. Chapman & Hall, London.Google ScholarGoogle Scholar
  5. Jef Bezanson, Jiahao Chen, Ben Chung, Stefan Karpinski, Viral B. Shah, Jan Vitek, and Lionel Zoubritzky. 2018. Julia: Dynamism and Performance Reconciled by Design. Proc. ACM Program. Lang. 2, OOPSLA ( 2018 ). https://doi.org/10. 1145/3276490 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Gavin Bierman, Martin Abadi, and Mads Torgersen. 2014. Understanding TypeScript. In European Conference on ObjectOriented Programming (ECOOP).Google ScholarGoogle Scholar
  7. Gavin M. Bierman, Erik Meijer, and Mads Torgersen. 2010. Adding Dynamic Types to C#. In European Conference on Object-Oriented Programming (ECOOP). https://doi.org/10.1007/978-3-642-14107-2_5 Google ScholarGoogle ScholarCross RefCross Ref
  8. Michael Furr, Jong-hoon (David) An, and Jefrey S. Foster. 2009. Profile-guided static typing for dynamic scripting languages. In OOPSLA. https://doi.org/10.1145/1640089.1640110 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Aviral Goel and Jan Vitek. 2019. On the Design, Implementation, and Use of Laziness in R. Proc. ACM Program. Lang. 3, OOPSLA ( 2019 ). https://doi.org/10.1145/3360579 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Ross Ihaka and Robert Gentleman. 1996. R: A Language for Data Analysis and Graphics. Journal of Computational and Graphical Statistics 5, 3 ( 1996 ), 299-314. http://www.amstat.org/publications/jcgs/Google ScholarGoogle Scholar
  11. Uwe Ligges. [n. d.]. 20 Years of CRAN (Video on Channel9. In Keynote at UseR!Google ScholarGoogle Scholar
  12. André Murbach Maidl, Fabio Mascarenhas, and Roberto Ierusalimschy. 2014. Typed Lua: An Optional Type System for Lua. In Workshop on Dynamic Languages and Applications (DyLa). https://doi.org/10.1145/2617548.2617553 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Floréal Morandat, Brandon Hill, Leo Osvald, and Jan Vitek. 2012. Evaluating the Design of the R Language: Objects and Functions for Data Analysis. In European Conference on Object-Oriented Programming (ECOOP). https://doi.org/10.1007/ 978-3-642-31057-7_6 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Python Team. 2020. Type Hints for Python. https://docs.python.org/3/library/typing.html.Google ScholarGoogle Scholar
  15. Ole Tange et al. 2011. Gnu parallel-the command-line power tool. The USENIX Magazine 36, 1 ( 2011 ).Google ScholarGoogle Scholar
  16. Sam Tobin-Hochstadt and Matthias Felleisen. 2008. The design and implementation of typed Scheme. In Symposium on Principles of Programming Languages (POPL).Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Julien Verlaguet. 2013. Hack for HipHop. CUFP, 2013, http://tinyurl.com/lk8fy9q.Google ScholarGoogle Scholar
  18. Tobias Wrigstad, Francesco Zappa Nardelli, Sylvain Lebresne, Johan Östlund, and Jan Vitek. 2010. Integrating typed and untyped code in a scripting language. In Symposium on Principles of Programming Languages (POPL). https: //doi.org/10.1145/1706299.1706343 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Designing types for R, empirically

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!