Abstract
High-level languages are growing in popularity. However, decades of C software development have produced large libraries of fast, time-tested, meritorious code that are impractical to recreate from scratch. Cross-language bindings can expose low-level C code to high-level languages. Unfortunately, writing bindings by hand is tedious and error-prone, while mainstream binding generators require extensive manual annotation or fail to offer the language features that users of modern languages have come to expect.
We present an improved binding-generation strategy based on static analysis of unannotated library source code. We characterize three high-level idioms that are not uniquely expressible in C's low-level type system: array parameters, resource managers, and multiple return values. We describe a suite of interprocedural analyses that recover this high-level information, and we show how the results can be used in a binding generator for the Python programming language. In experiments with four large C libraries, we find that our approach avoids the mistakes characteristic of hand-written bindings while offering a level of Python integration unmatched by prior automated approaches. Among the thousands of functions in the public interfaces of these libraries, roughly 40% exhibit the behaviors detected by our static analyses.
- B. Alpern, M. N. Wegman, and F. K. Zadeck. Detecting equality of variables in programs. In POPL '88: Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 1--11, New York, NY, USA, 1988. ACM. ISBN 0-89791-252-7. doi: http://doi.acm.org/10.1145/73560.73561. Google Scholar
Digital Library
- L. O. Andersen. Program Analysis and Specialization for the C Programming Language. PhD thesis, DIKU, Department of Computer Science, University of Cophenhagen, May 1994.Google Scholar
- D. M. Beazley. SWIG: an easy to use tool for integrating scripting languages with C and C++. In TCLTK'96: Proceedings of the 4th conference on USENIX Tcl/Tk Workshop, 1996, pages 15--15, Berkeley, CA, USA, 1996. USENIX Association. Google Scholar
Digital Library
- D. M. Beazley. Simplified wrapper and interface generator. http://www.swig.org, Nov. 2008.Google Scholar
- E. Busboom, A. Cancro, and W. Goesgens. libical. http://freeassociation.sourceforge.net/, Nov. 2008.Google Scholar
- P. Cousot and N. Halbwachs. Automatic discovery of linear restraints among variables of a program. In POPL '78: Proceedings of the 5th ACM SIGACT-SIGPLAN symposium on Principles of programming languages, pages 84--96, New York, NY, USA, 1978. ACM. doi: http://doi.acm.org/10.1145/512760.512770. Google Scholar
Digital Library
- Ctypesgen Developers. ctypesgen. http://code.google.com/p/ctypesgen/, Nov. 2008.Google Scholar
- R. Cytron and R. Gershbein. Efficient accommodation of may-alias information in SSA form. In PLDI '93: Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation, pages 36--45, New York, NY, USA, 1993. ACM. ISBN 0-89791-598-4. doi: http://doi.acm.org/10.1145/155090.155094. Google Scholar
Digital Library
- M. Elder, S. Jackson, and B. Liblit. Code sandwiches. Technical Report 1647, University of Wisconsin-Madison, Oct. 2008.Google Scholar
- J. S. Foster, R. Johnson, J. Kodumal, and A. Aiken. Flow-insensitive type qualifiers. ACM Trans. Program. Lang. Syst., 28(6):1035--1087, 2006. Google Scholar
Digital Library
- M. Furr and J. S. Foster. Checking type safety of foreign function calls. In PLDI '05: Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation, pages 62--72, New York, NY, USA, 2005. ACM. ISBN 1-59593-056-6. doi: http://doi.acm.org/10.1145/1065010.1065019. Google Scholar
Digital Library
- J. Gailly and M. Adler. zlib home site. http://zlib.net/, Nov. 2008.Google Scholar
- M. Galassi, J. Davies, J. Theiler, B. Gough, G. Jungman, M. Booth, and F. Rossi. GNU Scientific Library Reference Manual. Network Theory Ltd., Bristol, United Kingdom, revised second edition, Aug. 2006.Google Scholar
- The GNOME Project. GNOME Bug Tracking System. http://bugzilla.gnome.org, Jan. 2009.Google Scholar
- H. S. Gunawi, C. Rubio-González, A. C. Arpaci-Dusseau, R. H. Arpaci-Dusseau, and B. Liblit. EIO: Error handling is occasionally correct. In M. Baker and E. Riedel, editors, FAST, pages 207--222. USENIX, 2008. ISBN 978-1-931971-56-0. Google Scholar
Digital Library
- M. J. Harrold and M. L. Soffa. Efficient computation of interprocedural definition-use chains. ACM Trans. Program. Lang. Syst., 16(2):175--204, 1994. ISSN 0164-0925. doi: http://doi.acm.org/10.1145/174662.174663. Google Scholar
Digital Library
- D. L. Heine and M. S. Lam. A practical flow-sensitive and context-sensitive C and C++ memory leak detector. In PLDI '03: Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation, pages 168--181, New York, NY, USA, 2003. ACM. ISBN 1-58113-662-5. doi: http://doi.acm.org/10.1145/781131.781150. Google Scholar
Digital Library
- T. Heller. ctypeslib -- useful additions to the ctypes FFI library. http://pypi.python.org/pypi/ctypeslib/, Nov. 2008.Google Scholar
- S. Jaroszewicz. ctypesGSL. http://www.cs.umb.edu/sj/ctypesGsl/, Aug. 2008.Google Scholar
- T. Kientzle. libarchive. http://people.freebsd.org/~kientzle/libarchive/, Nov. 2008.Google Scholar
- T. Kremenek, P. Twohey, G. Back, A. Ng, and D. Engler. From uncertainty to belief: inferring the specification within. In OSDI '06: Proceedings of the 7th symposium on Operating systems design and implementation, pages 161--176, Berkeley, CA, USA, 2006. USENIX Association. ISBN 1-931971-47-1. Google Scholar
Digital Library
- C. Lattner. LLVM and Clang: Next generation compiler technology. In BSDCan 2008: The BSD Conference, Ottawa, Canada, May 2008.Google Scholar
- C. Lattner and V. S. Adve. LLVM: A compilation framework for lifelong program analysis & transformation. In CGO, pages 75--88. IEEE Computer Society, 2004. ISBN 0-7695-2102-9. Google Scholar
Digital Library
- A. Makhorin. GLPK (GNU linear programming kit). http://www.gnu.org/software/glpk/, Nov. 2008.Google Scholar
- M.-T. Pham. ctypes-glpk: A Python wrapper for GLPK using ctypes. http://code.google.com/p/ctypes-glpk, Nov. 2008.Google Scholar
- J. Reppy and C. Song. Application-specific foreign-interface generation. In GPCE '06: Proceedings of the 5th international conference on Generative programming and component engineering, pages 49--58, New York, NY, USA, 2006. ACM. ISBN 1-59593-237-2. doi: http://doi.acm.org/10.1145/1173706.1173714. Google Scholar
Digital Library
- C. Rubio-González, H. S. Gunawi, B. Liblit, R. H. Arpaci-Dusseau, and A. C. Arpaci-Dusseau. Error propagation analysis for file systems. In Proceedings of the ACM SIGPLAN 2009 Conference on Programming Language Design and Implementation, Dublin, Ireland, June 15--20 2009. Google Scholar
Digital Library
- J. Seward. bzip2. http://www.bzip.org/, Nov. 2008.Google Scholar
- Silicon Graphics, Inc. libacl. http://oss.sgi.com/projects/xfs/, Feb. 2008.Google Scholar
- Silicon Graphics, Inc. libattr. http://oss.sgi.com/projects/xfs/, Feb. 2008.Google Scholar
Index Terms
Automatic generation of library bindings using static analysis
Recommendations
Automatic generation of library bindings using static analysis
PLDI '09: Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and ImplementationHigh-level languages are growing in popularity. However, decades of C software development have produced large libraries of fast, time-tested, meritorious code that are impractical to recreate from scratch. Cross-language bindings can expose low-level C ...
Automatic array property detection via static analysis
SPLASH Companion 2015: Companion Proceedings of the 2015 ACM SIGPLAN International Conference on Systems, Programming, Languages and Applications: Software for HumanitySimultaneous use of multiple programming languages aids in creating efficient modern programs in the face of legacy code; however, creating language bindings to low-level languages like C by hand is tedious and error prone. We offer an automated suite ...
Array length inference for C library bindings
ASE '16: Proceedings of the 31st IEEE/ACM International Conference on Automated Software EngineeringSimultaneous use of multiple programming languages (polyglot programming) assists in creating efficient, coherent, modern programs in the face of legacy code. However, manually creating bindings to low-level languages like C is tedious and error-prone. ...







Comments