skip to main content

SimTyper: sound type inference for Ruby using type equality prediction

Published:15 October 2021Publication History
Skip Abstract Section

Abstract

Many researchers have explored type inference for dynamic languages. However, traditional type inference computes most general types which, for complex type systems—which are often needed to type dynamic languages—can be verbose, complex, and difficult to understand. In this paper, we introduce SimTyper, a Ruby type inference system that aims to infer usable types—specifically, nominal and generic types—that match the types programmers write. SimTyper builds on InferDL, a recent Ruby type inference system that soundly combines standard type inference with heuristics. The key novelty of SimTyper is type equality prediction, a new, machine learning-based technique that predicts when method arguments or returns are likely to have the same type. SimTyper finds pairs of positions that are predicted to have the same type yet one has a verbose, overly general solution and the other has a usable solution. It then guesses the two types are equal, keeping the guess if it is consistent with the rest of the program, and discarding it if not. In this way, types inferred by SimTyper are guaranteed to be sound. To perform type equality prediction, we introduce the deep similarity (DeepSim) neural network. DeepSim is a novel machine learning classifier that follows the Siamese network architecture and uses CodeBERT, a pre-trained model, to embed source tokens into vectors that capture tokens and their contexts. DeepSim is trained on 100,000 pairs labeled with type similarity information extracted from 371 Ruby programs with manually documented, but not checked, types. We evaluated SimTyper on eight Ruby programs and found that, compared to standard type inference, SimTyper finds 69% more types that match programmer-written type information. Moreover, DeepSim can predict rare types that appear neither in the Ruby standard library nor in the training data. Our results show that type equality prediction can help type inference systems effectively produce more usable types.

Skip Supplemental Material Section

Supplemental Material

Auxiliary Presentation Video

This is a presentation video of my talk at OOPSLA 2021 on the paper "SimTyper: Sound Type Inference for Ruby using Type Equality Prediction."

References

  1. Alex Aiken and Brian Murphy. 1991. Static Type Inference in a Dynamically Typed Language. In Proceedings of the 18th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’91). Association for Computing Machinery, New York, NY, USA. 279–290. isbn:0897914198 https://doi.org/10.1145/99583.99621 Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Alexander Aiken, Edward L. Wimmers, and T. K. Lakshman. 1994. Soft Typing with Conditional Types. In Proceedings of the 21st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’94). ACM, New York, NY, USA. 163–173. https://doi.org/10.1145/174675.177847 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Miltiadis Allamanis, Earl T. Barr, Soline Ducousso, and Zheng Gao. 2020. Typilus: Neural Type Hints. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2020). Association for Computing Machinery, New York, NY, USA. 91–105. isbn:9781450376136 https://doi.org/10.1145/3385412.3385997 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Christopher Anderson, Paola Giannini, and Sophia Drossopoulou. 2005. Towards Type Inference for Javascript. In ECOOP 2005 - Object-Oriented Programming (ECOOP). Springer, Berlin, Heidelberg. 428–452. https://doi.org/10.1007/11531142_19 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. John Aycock. 2000. Aggressive Type Inference.Google ScholarGoogle Scholar
  6. Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.). 33, Curran Associates, Inc., 1877–1901. https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdfGoogle ScholarGoogle Scholar
  7. Nat Budin. 2021. Journey: An online questionnaire application. https://github.com/nbudin/journey/Google ScholarGoogle Scholar
  8. Robert Cartwright and Mike Fagan. 1991. Soft Typing. In Proceedings of the ACM SIGPLAN 1991 Conference on Programming Language Design and Implementation (PLDI ’91). Association for Computing Machinery, New York, NY, USA. 278–292. https://doi.org/10.1145/113445.113469 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Code.org. 2021. The code powering. code.orgGoogle ScholarGoogle Scholar
  10. H. B. Curry and R. Feys. 1958. Combinatory Logic, Volume I. North-Holland, Amsterdam. Second printing 1968.Google ScholarGoogle Scholar
  11. Luis Damas and Robin Milner. 1982. Principal Type-Schemes for Functional Programs. In Proceedings of the 9th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’82). Association for Computing Machinery, New York, NY, USA. 207–212. https://doi.org/10.1145/582153.582176 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Shane Emmons and Anthony Dmitriyev. 2021. Money. https://github.com/RubyMoney/moneyGoogle ScholarGoogle Scholar
  13. Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. 2020. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. In Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, Online. 1536–1547. https://doi.org/10.18653/v1/2020.findings-emnlp.139 Google ScholarGoogle ScholarCross RefCross Ref
  14. Cormac Flanagan and Matthias Felleisen. 1997. Componential Set-Based Analysis. In Proceedings of the ACM SIGPLAN 1997 Conference on Programming Language Design and Implementation (PLDI ’97). Association for Computing Machinery, New York, NY, USA. 235–248. https://doi.org/10.1145/258916.258937 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Jeffrey Foster, Brianna Ren, Stephen Strickland, Alexander Yu, Milod Kazerounian, and Sankha Narayan Guria. 2018. RDL: Types, type checking, and contracts for Ruby. https://github.com/tupl-tufts/rdlGoogle ScholarGoogle Scholar
  16. Jeffrey S. Foster. 2021. Talks. https://github.com/jeffrey-s-foster/talksGoogle ScholarGoogle Scholar
  17. Michael Furr, Jong-hoon (David) An, Jeffrey S. Foster, and Michael Hicks. 2009. Static Type Inference for Ruby. In Proceedings of the 2009 ACM Symposium on Applied Computing (SAC ’09). ACM, New York, NY, USA. 1859–1866. https://doi.org/10.1145/1529282.1529700 Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Brian Hackett and Shu-yu Guo. 2012. Fast and Precise Hybrid Type Inference for JavaScript. SIGPLAN Not., 47, 6 (2012), June, 239–250. issn:0362-1340 https://doi.org/10.1145/2345156.2254094 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Vincent J. Hellendoorn, Christian Bird, Earl T. Barr, and Miltiadis Allamanis. 2018. Deep Learning Type Inference. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2018). Association for Computing Machinery, New York, NY, USA. 152–162. https://doi.org/10.1145/3236024.3236051 Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. R. Hindley. 1969. The Principal Type-Scheme of an Object in Combinatory Logic. Trans. Amer. Math. Soc., 146 (1969), 29–60. issn:00029947 https://doi.org/10.1090/S0002-9947-1969-0253905-6 Google ScholarGoogle ScholarCross RefCross Ref
  21. Howard Hinnant. 2013. chrono-Compatible Low-Level Date Algorithms. https://howardhinnant.github.io/date_algorithms.htmlGoogle ScholarGoogle Scholar
  22. Civilized Discourse Construction Kit Inc.. 2021. Discourse: A platform for community discussion. https://github.com/discourse/discourseGoogle ScholarGoogle Scholar
  23. Aditya Kanade, Petros Maniatis, Gogul Balakrishnan, and Kensen Shi. 2020. Learning and Evaluating Contextual Embedding of Source Code. In Proceedings of the 37th International Conference on Machine Learning, Hal Daumé III and Aarti Singh (Eds.) (Proceedings of Machine Learning Research, Vol. 119). PMLR, 5110–5121. http://proceedings.mlr.press/v119/kanade20a.htmlGoogle ScholarGoogle Scholar
  24. Milod Kazerounian. 2021. Personal communication.Google ScholarGoogle Scholar
  25. Milod Kazerounian, Jeffrey S. Foster, and Bonan Min. 2021. SimTyper Artifact. https://doi.org/10.5281/zenodo.5449078 Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Milod Kazerounian, Sankha Narayan Guria, Niki Vazou, Jeffrey S. Foster, and David Van Horn. 2019. Type-level Computations for Ruby Libraries. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2019). ACM, New York, NY, USA. 966–979. https://doi.org/10.1145/3314221.3314630 Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Milod Kazerounian, Brianna M. Ren, and Jeffrey S. Foster. 2020. Sound, Heuristic Type Annotation Inference for Ruby. In Proceedings of the 16th ACM SIGPLAN International Symposium on Dynamic Languages (DLS 2020). Association for Computing Machinery, New York, NY, USA. 112–125. isbn:9781450381758 https://doi.org/10.1145/3426422.3426985 Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Diederik P Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In ICLR (Poster).Google ScholarGoogle Scholar
  29. Gregory Koch, Richard Zemel, and Ruslan Salakhutdinov. 2015. Siamese neural networks for one-shot image recognition. ICML deep learning workshop. JMLR.org, Online.Google ScholarGoogle Scholar
  30. Benjamin S. Lerner, Joe Gibbs Politz, Arjun Guha, and Shriram Krishnamurthi. 2013. TeJaS: Retrofitting Type Systems for JavaScript. In Proceedings of the 9th Symposium on Dynamic Languages (DLS). ACM, New York, NY, USA. 1–16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Rabee Sohail Malik, Jibesh Patra, and Michael Pradel. 2019. NL2Type: Inferring JavaScript Function Types from Natural Language Information. In Proceedings of the 41st International Conference on Software Engineering (ICSE ’19). IEEE Press, Montreal, Quebec, Canada. 304–315. https://doi.org/10.1109/ICSE.2019.00045 Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Robin Milner. 1978. A theory of type polymorphism in programming. J. Comput. System Sci., 17 (1978), 348–375. https://doi.org/10.1016/0022-0000(78)90014-4 Google ScholarGoogle ScholarCross RefCross Ref
  33. MiniMagick. 2021. MiniMagick. https://github.com/minimagick/minimagickGoogle ScholarGoogle Scholar
  34. Dmitry Petrashko. 2020. Personal communication.Google ScholarGoogle Scholar
  35. Postmodern. 2021. Ronin. https://github.com/ronin-rb/roninGoogle ScholarGoogle Scholar
  36. François Pottier. 1998. A Framework for Type Inference with Subtyping. In Proceedings of the Third ACM SIGPLAN International Conference on Functional Programming (ICFP ’98). Association for Computing Machinery, New York, NY, USA. 228–238. https://doi.org/10.1145/291251.289448 Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Michael Pradel, Georgios Gousios, Jason Liu, and Satish Chandra. 2019. TypeWriter: Neural Type Prediction with Search-based Validation. arxiv:1912.03768.Google ScholarGoogle Scholar
  38. Veselin Raychev, Martin Vechev, and Andreas Krause. 2015. Predicting Program Properties from “Big Code”. In Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’15). Association for Computing Machinery, New York, NY, USA. 111–124. https://doi.org/10.1145/2775051.2677009 Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China. 3982–3992. https://doi.org/10.18653/v1/D19-1410 Google ScholarGoogle ScholarCross RefCross Ref
  40. Brianna M. Ren and Jeffrey S. Foster. 2016. Just-in-time Static Type Checking for Dynamic Languages. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). ACM, New York, NY, USA. 462–476. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Phil Ross. 2021. TZInfo. https://github.com/tzinfo/tzinfoGoogle ScholarGoogle Scholar
  42. Loren Segal. 2020. YARD: Yay! A Ruby Documentation Tool. http://yardoc.orgGoogle ScholarGoogle Scholar
  43. Jeremy Siek and Walid Taha. 2006. Gradual typing for functional languages. In Seventh Workshop on Scheme and Functional Programming. ACM, Portland, OR, USA. 81–92.Google ScholarGoogle Scholar
  44. T. Stephen Strickland, Brianna M. Ren, and Jeffrey S. Foster. 2014. Contracts for Domain-Specific Languages in Ruby. In Proceedings of the 10th ACM Symposium on Dynamic Languages (DLS ’14). Association for Computing Machinery, New York, NY, USA. 23–34. isbn:9781450332118 https://doi.org/10.1145/2661088.2661092 Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Stripe. 2020. Sorbet: A static type checker for Ruby. https://sorbet.org/Google ScholarGoogle Scholar
  46. Sam Tobin-Hochstadt and Matthias Felleisen. 2008. The Design and Implementation of Typed Scheme. In Proceedings of the 35th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL). ACM, New York, NY, USA. 395–406. https://doi.org/10.1145/1328438.1328486 Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Michael M. Vitousek, Andrew M. Kent, Jeremy G. Siek, and Jim Baker. 2014. Design and Evaluation of Gradual Typing for Python. In Proceedings of the 10th ACM Symposium on Dynamic Languages (DLS ’14). Association for Computing Machinery, New York, NY, USA. 45–56. https://doi.org/10.1145/2775052.2661101 Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Andrew K. Wright and Robert Cartwright. 1997. A Practical Soft Type System for Scheme. ACM Trans. Program. Lang. Syst., 19, 1 (1997), Jan., 87–152. issn:0164-0925 https://doi.org/10.1145/239912.239917 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. SimTyper: sound type inference for Ruby using type equality prediction

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!