Abstract
The R programming language is widely used in a variety of domains. It was designed to favor an interactive style of programming with minimal syntactic and conceptual overhead. This design is well suited to data analysis, but a bad fit for tools such as compilers or program analyzers. In particular, R has no type annotations, and all operations are dynamically checked at run-time. The starting point for our work are the two questions: what expressive power is needed to accurately type R code? and which type system is the R community willing to adopt? Both questions are difficult to answer without actually experimenting with a type system. The goal of this paper is to provide data that can feed into that design process. To this end, we perform a large corpus analysis to gain insights in the degree of polymorphism exhibited by idiomatic R code and explore potential benefits that the R community could accrue from a simple type system. As a starting point, we infer type signatures for 25,215 functions from 412 packages among the most widely used open source R libraries. We then conduct an evaluation on 8,694 clients of these packages, as well as on end-user code from the Kaggle data science competition website.
Supplemental Material
- Miltiadis Allamanis, Earl T. Barr, Soline Ducousso, and Zheng Gao. 2020. Typilus: neural type hints. In Conference on Programming Language Design and Implementation (PLDI). https://doi.org/10.1145/3385412.3385997 Google Scholar
Digital Library
- Jong-hoon (David) An, Avik Chaudhuri, Jefrey S. Foster, and Michael Hicks. 2011. Dynamic Inference of Static Types for Ruby. In Proceedings of the 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. https://doi.org/10.1145/1926385.1926437 Google Scholar
Digital Library
- Esben Andreasen, Colin S. Gordon, Satish Chandra, Manu Sridharan, Frank Tip, and Koushik Sen. 2016. Trace Typing: An Approach for Evaluating Retrofitted Type Systems. In European Conference on Object-Oriented Programming (ECOOP). https://doi.org/10.4230/LIPIcs.ECOOP. 2016.1 Google Scholar
Cross Ref
- Richard A. Becker, John M. Chambers, and Allan R. Wilks. 1988. The New S Language. Chapman & Hall, London.Google Scholar
- Jef Bezanson, Jiahao Chen, Ben Chung, Stefan Karpinski, Viral B. Shah, Jan Vitek, and Lionel Zoubritzky. 2018. Julia: Dynamism and Performance Reconciled by Design. Proc. ACM Program. Lang. 2, OOPSLA ( 2018 ). https://doi.org/10. 1145/3276490 Google Scholar
Digital Library
- Gavin Bierman, Martin Abadi, and Mads Torgersen. 2014. Understanding TypeScript. In European Conference on ObjectOriented Programming (ECOOP).Google Scholar
- Gavin M. Bierman, Erik Meijer, and Mads Torgersen. 2010. Adding Dynamic Types to C#. In European Conference on Object-Oriented Programming (ECOOP). https://doi.org/10.1007/978-3-642-14107-2_5 Google Scholar
Cross Ref
- Michael Furr, Jong-hoon (David) An, and Jefrey S. Foster. 2009. Profile-guided static typing for dynamic scripting languages. In OOPSLA. https://doi.org/10.1145/1640089.1640110 Google Scholar
Digital Library
- Aviral Goel and Jan Vitek. 2019. On the Design, Implementation, and Use of Laziness in R. Proc. ACM Program. Lang. 3, OOPSLA ( 2019 ). https://doi.org/10.1145/3360579 Google Scholar
Digital Library
- Ross Ihaka and Robert Gentleman. 1996. R: A Language for Data Analysis and Graphics. Journal of Computational and Graphical Statistics 5, 3 ( 1996 ), 299-314. http://www.amstat.org/publications/jcgs/Google Scholar
- Uwe Ligges. [n. d.]. 20 Years of CRAN (Video on Channel9. In Keynote at UseR!Google Scholar
- André Murbach Maidl, Fabio Mascarenhas, and Roberto Ierusalimschy. 2014. Typed Lua: An Optional Type System for Lua. In Workshop on Dynamic Languages and Applications (DyLa). https://doi.org/10.1145/2617548.2617553 Google Scholar
Digital Library
- Floréal Morandat, Brandon Hill, Leo Osvald, and Jan Vitek. 2012. Evaluating the Design of the R Language: Objects and Functions for Data Analysis. In European Conference on Object-Oriented Programming (ECOOP). https://doi.org/10.1007/ 978-3-642-31057-7_6 Google Scholar
Digital Library
- Python Team. 2020. Type Hints for Python. https://docs.python.org/3/library/typing.html.Google Scholar
- Ole Tange et al. 2011. Gnu parallel-the command-line power tool. The USENIX Magazine 36, 1 ( 2011 ).Google Scholar
- Sam Tobin-Hochstadt and Matthias Felleisen. 2008. The design and implementation of typed Scheme. In Symposium on Principles of Programming Languages (POPL).Google Scholar
Digital Library
- Julien Verlaguet. 2013. Hack for HipHop. CUFP, 2013, http://tinyurl.com/lk8fy9q.Google Scholar
- Tobias Wrigstad, Francesco Zappa Nardelli, Sylvain Lebresne, Johan Östlund, and Jan Vitek. 2010. Integrating typed and untyped code in a scripting language. In Symposium on Principles of Programming Languages (POPL). https: //doi.org/10.1145/1706299.1706343 Google Scholar
Digital Library
Index Terms
Designing types for R, empirically
Recommendations
Type freezing: exploiting attribute type monomorphism in tracing JIT compilers
CGO 2020: Proceedings of the 18th ACM/IEEE International Symposium on Code Generation and OptimizationDynamic programming languages continue to increase in popularity. While just-in-time (JIT) compilation can improve the performance of dynamic programming languages, a significant performance gap remains with respect to ahead-of-time compiled languages. ...
Practical, pluggable types for a dynamic language
Most languages fall into one of two camps: either they adopt a unique, static type system, or they abandon static type-checks for run-time checks. Pluggable types blur this division by (i) making static type systems optional, and (ii) supporting a ...
SimTyper: sound type inference for Ruby using type equality prediction
Many researchers have explored type inference for dynamic languages. However, traditional type inference computes most general types which, for complex type systems—which are often needed to type dynamic languages—can be verbose, complex, and difficult ...






Comments