Abstract
Many researchers have explored type inference for dynamic languages. However, traditional type inference computes most general types which, for complex type systems—which are often needed to type dynamic languages—can be verbose, complex, and difficult to understand. In this paper, we introduce SimTyper, a Ruby type inference system that aims to infer usable types—specifically, nominal and generic types—that match the types programmers write. SimTyper builds on InferDL, a recent Ruby type inference system that soundly combines standard type inference with heuristics. The key novelty of SimTyper is type equality prediction, a new, machine learning-based technique that predicts when method arguments or returns are likely to have the same type. SimTyper finds pairs of positions that are predicted to have the same type yet one has a verbose, overly general solution and the other has a usable solution. It then guesses the two types are equal, keeping the guess if it is consistent with the rest of the program, and discarding it if not. In this way, types inferred by SimTyper are guaranteed to be sound. To perform type equality prediction, we introduce the deep similarity (DeepSim) neural network. DeepSim is a novel machine learning classifier that follows the Siamese network architecture and uses CodeBERT, a pre-trained model, to embed source tokens into vectors that capture tokens and their contexts. DeepSim is trained on 100,000 pairs labeled with type similarity information extracted from 371 Ruby programs with manually documented, but not checked, types. We evaluated SimTyper on eight Ruby programs and found that, compared to standard type inference, SimTyper finds 69% more types that match programmer-written type information. Moreover, DeepSim can predict rare types that appear neither in the Ruby standard library nor in the training data. Our results show that type equality prediction can help type inference systems effectively produce more usable types.
Supplemental Material
- Alex Aiken and Brian Murphy. 1991. Static Type Inference in a Dynamically Typed Language. In Proceedings of the 18th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’91). Association for Computing Machinery, New York, NY, USA. 279–290. isbn:0897914198 https://doi.org/10.1145/99583.99621 Google Scholar
Digital Library
- Alexander Aiken, Edward L. Wimmers, and T. K. Lakshman. 1994. Soft Typing with Conditional Types. In Proceedings of the 21st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’94). ACM, New York, NY, USA. 163–173. https://doi.org/10.1145/174675.177847 Google Scholar
Digital Library
- Miltiadis Allamanis, Earl T. Barr, Soline Ducousso, and Zheng Gao. 2020. Typilus: Neural Type Hints. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2020). Association for Computing Machinery, New York, NY, USA. 91–105. isbn:9781450376136 https://doi.org/10.1145/3385412.3385997 Google Scholar
Digital Library
- Christopher Anderson, Paola Giannini, and Sophia Drossopoulou. 2005. Towards Type Inference for Javascript. In ECOOP 2005 - Object-Oriented Programming (ECOOP). Springer, Berlin, Heidelberg. 428–452. https://doi.org/10.1007/11531142_19 Google Scholar
Digital Library
- John Aycock. 2000. Aggressive Type Inference.Google Scholar
- Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.). 33, Curran Associates, Inc., 1877–1901. https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdfGoogle Scholar
- Nat Budin. 2021. Journey: An online questionnaire application. https://github.com/nbudin/journey/Google Scholar
- Robert Cartwright and Mike Fagan. 1991. Soft Typing. In Proceedings of the ACM SIGPLAN 1991 Conference on Programming Language Design and Implementation (PLDI ’91). Association for Computing Machinery, New York, NY, USA. 278–292. https://doi.org/10.1145/113445.113469 Google Scholar
Digital Library
- Code.org. 2021. The code powering. code.orgGoogle Scholar
- H. B. Curry and R. Feys. 1958. Combinatory Logic, Volume I. North-Holland, Amsterdam. Second printing 1968.Google Scholar
- Luis Damas and Robin Milner. 1982. Principal Type-Schemes for Functional Programs. In Proceedings of the 9th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’82). Association for Computing Machinery, New York, NY, USA. 207–212. https://doi.org/10.1145/582153.582176 Google Scholar
Digital Library
- Shane Emmons and Anthony Dmitriyev. 2021. Money. https://github.com/RubyMoney/moneyGoogle Scholar
- Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. 2020. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. In Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, Online. 1536–1547. https://doi.org/10.18653/v1/2020.findings-emnlp.139 Google Scholar
Cross Ref
- Cormac Flanagan and Matthias Felleisen. 1997. Componential Set-Based Analysis. In Proceedings of the ACM SIGPLAN 1997 Conference on Programming Language Design and Implementation (PLDI ’97). Association for Computing Machinery, New York, NY, USA. 235–248. https://doi.org/10.1145/258916.258937 Google Scholar
Digital Library
- Jeffrey Foster, Brianna Ren, Stephen Strickland, Alexander Yu, Milod Kazerounian, and Sankha Narayan Guria. 2018. RDL: Types, type checking, and contracts for Ruby. https://github.com/tupl-tufts/rdlGoogle Scholar
- Jeffrey S. Foster. 2021. Talks. https://github.com/jeffrey-s-foster/talksGoogle Scholar
- Michael Furr, Jong-hoon (David) An, Jeffrey S. Foster, and Michael Hicks. 2009. Static Type Inference for Ruby. In Proceedings of the 2009 ACM Symposium on Applied Computing (SAC ’09). ACM, New York, NY, USA. 1859–1866. https://doi.org/10.1145/1529282.1529700 Google Scholar
Digital Library
- Brian Hackett and Shu-yu Guo. 2012. Fast and Precise Hybrid Type Inference for JavaScript. SIGPLAN Not., 47, 6 (2012), June, 239–250. issn:0362-1340 https://doi.org/10.1145/2345156.2254094 Google Scholar
Digital Library
- Vincent J. Hellendoorn, Christian Bird, Earl T. Barr, and Miltiadis Allamanis. 2018. Deep Learning Type Inference. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2018). Association for Computing Machinery, New York, NY, USA. 152–162. https://doi.org/10.1145/3236024.3236051 Google Scholar
Digital Library
- R. Hindley. 1969. The Principal Type-Scheme of an Object in Combinatory Logic. Trans. Amer. Math. Soc., 146 (1969), 29–60. issn:00029947 https://doi.org/10.1090/S0002-9947-1969-0253905-6 Google Scholar
Cross Ref
- Howard Hinnant. 2013. chrono-Compatible Low-Level Date Algorithms. https://howardhinnant.github.io/date_algorithms.htmlGoogle Scholar
- Civilized Discourse Construction Kit Inc.. 2021. Discourse: A platform for community discussion. https://github.com/discourse/discourseGoogle Scholar
- Aditya Kanade, Petros Maniatis, Gogul Balakrishnan, and Kensen Shi. 2020. Learning and Evaluating Contextual Embedding of Source Code. In Proceedings of the 37th International Conference on Machine Learning, Hal Daumé III and Aarti Singh (Eds.) (Proceedings of Machine Learning Research, Vol. 119). PMLR, 5110–5121. http://proceedings.mlr.press/v119/kanade20a.htmlGoogle Scholar
- Milod Kazerounian. 2021. Personal communication.Google Scholar
- Milod Kazerounian, Jeffrey S. Foster, and Bonan Min. 2021. SimTyper Artifact. https://doi.org/10.5281/zenodo.5449078 Google Scholar
Digital Library
- Milod Kazerounian, Sankha Narayan Guria, Niki Vazou, Jeffrey S. Foster, and David Van Horn. 2019. Type-level Computations for Ruby Libraries. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2019). ACM, New York, NY, USA. 966–979. https://doi.org/10.1145/3314221.3314630 Google Scholar
Digital Library
- Milod Kazerounian, Brianna M. Ren, and Jeffrey S. Foster. 2020. Sound, Heuristic Type Annotation Inference for Ruby. In Proceedings of the 16th ACM SIGPLAN International Symposium on Dynamic Languages (DLS 2020). Association for Computing Machinery, New York, NY, USA. 112–125. isbn:9781450381758 https://doi.org/10.1145/3426422.3426985 Google Scholar
Digital Library
- Diederik P Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In ICLR (Poster).Google Scholar
- Gregory Koch, Richard Zemel, and Ruslan Salakhutdinov. 2015. Siamese neural networks for one-shot image recognition. ICML deep learning workshop. JMLR.org, Online.Google Scholar
- Benjamin S. Lerner, Joe Gibbs Politz, Arjun Guha, and Shriram Krishnamurthi. 2013. TeJaS: Retrofitting Type Systems for JavaScript. In Proceedings of the 9th Symposium on Dynamic Languages (DLS). ACM, New York, NY, USA. 1–16. Google Scholar
Digital Library
- Rabee Sohail Malik, Jibesh Patra, and Michael Pradel. 2019. NL2Type: Inferring JavaScript Function Types from Natural Language Information. In Proceedings of the 41st International Conference on Software Engineering (ICSE ’19). IEEE Press, Montreal, Quebec, Canada. 304–315. https://doi.org/10.1109/ICSE.2019.00045 Google Scholar
Digital Library
- Robin Milner. 1978. A theory of type polymorphism in programming. J. Comput. System Sci., 17 (1978), 348–375. https://doi.org/10.1016/0022-0000(78)90014-4 Google Scholar
Cross Ref
- MiniMagick. 2021. MiniMagick. https://github.com/minimagick/minimagickGoogle Scholar
- Dmitry Petrashko. 2020. Personal communication.Google Scholar
- Postmodern. 2021. Ronin. https://github.com/ronin-rb/roninGoogle Scholar
- François Pottier. 1998. A Framework for Type Inference with Subtyping. In Proceedings of the Third ACM SIGPLAN International Conference on Functional Programming (ICFP ’98). Association for Computing Machinery, New York, NY, USA. 228–238. https://doi.org/10.1145/291251.289448 Google Scholar
Digital Library
- Michael Pradel, Georgios Gousios, Jason Liu, and Satish Chandra. 2019. TypeWriter: Neural Type Prediction with Search-based Validation. arxiv:1912.03768.Google Scholar
- Veselin Raychev, Martin Vechev, and Andreas Krause. 2015. Predicting Program Properties from “Big Code”. In Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’15). Association for Computing Machinery, New York, NY, USA. 111–124. https://doi.org/10.1145/2775051.2677009 Google Scholar
Digital Library
- Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China. 3982–3992. https://doi.org/10.18653/v1/D19-1410 Google Scholar
Cross Ref
- Brianna M. Ren and Jeffrey S. Foster. 2016. Just-in-time Static Type Checking for Dynamic Languages. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). ACM, New York, NY, USA. 462–476. Google Scholar
Digital Library
- Phil Ross. 2021. TZInfo. https://github.com/tzinfo/tzinfoGoogle Scholar
- Loren Segal. 2020. YARD: Yay! A Ruby Documentation Tool. http://yardoc.orgGoogle Scholar
- Jeremy Siek and Walid Taha. 2006. Gradual typing for functional languages. In Seventh Workshop on Scheme and Functional Programming. ACM, Portland, OR, USA. 81–92.Google Scholar
- T. Stephen Strickland, Brianna M. Ren, and Jeffrey S. Foster. 2014. Contracts for Domain-Specific Languages in Ruby. In Proceedings of the 10th ACM Symposium on Dynamic Languages (DLS ’14). Association for Computing Machinery, New York, NY, USA. 23–34. isbn:9781450332118 https://doi.org/10.1145/2661088.2661092 Google Scholar
Digital Library
- Stripe. 2020. Sorbet: A static type checker for Ruby. https://sorbet.org/Google Scholar
- Sam Tobin-Hochstadt and Matthias Felleisen. 2008. The Design and Implementation of Typed Scheme. In Proceedings of the 35th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL). ACM, New York, NY, USA. 395–406. https://doi.org/10.1145/1328438.1328486 Google Scholar
Digital Library
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Google Scholar
Digital Library
- Michael M. Vitousek, Andrew M. Kent, Jeremy G. Siek, and Jim Baker. 2014. Design and Evaluation of Gradual Typing for Python. In Proceedings of the 10th ACM Symposium on Dynamic Languages (DLS ’14). Association for Computing Machinery, New York, NY, USA. 45–56. https://doi.org/10.1145/2775052.2661101 Google Scholar
Digital Library
- Andrew K. Wright and Robert Cartwright. 1997. A Practical Soft Type System for Scheme. ACM Trans. Program. Lang. Syst., 19, 1 (1997), Jan., 87–152. issn:0164-0925 https://doi.org/10.1145/239912.239917 Google Scholar
Digital Library
Index Terms
SimTyper: sound type inference for Ruby using type equality prediction
Recommendations
Sound, heuristic type annotation inference for Ruby
DLS 2020: Proceedings of the 16th ACM SIGPLAN International Symposium on Dynamic LanguagesMany researchers have explored retrofitting static type systems to dynamic languages. This raises the question of how to add type annotations to code that was previously untyped. One obvious solution is type inference. However, in complex type systems, ...
Kinded type inference for parameteric overloading
AbstractParameteric overloading refers to the combination of parameteric polymorphism and overloading of polymorphic operators. The formal basis for parametric overloading, proposed by Kaes and extended by Wadler and Blott, is based on type predicates. In ...
Type-level computations for Ruby libraries
PLDI 2019: Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and ImplementationMany researchers have explored ways to bring static typing to dynamic languages. However, to date, such systems are not precise enough when types depend on values, which often arises when using certain Ruby libraries. For example, the type safety of a ...






Comments