Abstract
When programmers encounter an unfamiliar API library, they often need to refer to its documentations, tutorials, or discussions on development forums to learn its proper usage. These API documents contain valuable information, but may also mislead programmers as they may contain errors (e.g., broken code names and obsolete code samples). Although most API documents are actively maintained and updated, studies show that many new and latent errors do exist. It is tedious and error-prone to find such errors manually as API documents can be enormous with thousands of pages. Existing tools are ineffective in locating documentation errors because traditional natural language (NL) tools do not understand code names and code samples, and traditional code analysis tools do not understand NL sentences. In this paper, we propose the first approach, DOCREF, specifically designed and developed to detect API documentation errors. We formulate a class of inconsistencies to indicate potential documentation errors, and combine NL and code analysis techniques to detect and report such inconsistencies. We have implemented DOCREF and evaluated its effectiveness on the latest documentations of five widely-used API libraries. DOCREF has detected more than 1,000 new documentation errors, which we have reported to the authors. Many of the errors have already been confirmed and fixed, after we reported them.
- A. Bacchelli, M. D'Ambros, and M. Lanza. Extracting source code from e-mails. In Proc. 18th ICPC, pages 24--33, 2010. Google Scholar
Digital Library
- A. Bacchelli, M. Lanza, and R. Robbes. Linking e-mails and source code artifacts. In Proc. 32nd ICSE, pages 375--384, 2010. Google Scholar
Digital Library
- A. Bacchelli, T. Dal Sasso, M. D'Ambros, and M. Lanza. Content classification of development emails. In Proc. 34th ICSE, pages 375--385, 2012. Google Scholar
Digital Library
- R. Buse and W. Weimer. Automatic documentation inference for exceptions. In Proc. ISSTA, pages 273--282, 2008. Google Scholar
Digital Library
- B. Carpenter and B. Baldwin. Text analysis with LingPipe 4. LingPipe Inc, 2011.Google Scholar
- C. E. Chaski. Empirical evaluations of language-based author identification techniques. Forensic Linguistics, 8:1--65, 2001.Google Scholar
- B. Dagenais and L. J. Hendren. Enabling static analysis for partial Java programs. In Proc. 23rd OOPSLA, pages 313--328, 2008. Google Scholar
Digital Library
- B. Dagenais and M. P. Robillard. Creating and evolving developer documentation: understanding the decisions of open source contributors. In Proc. 18th FSE, pages 127--136, 2010. Google Scholar
Digital Library
- B. Dagenais and M. P. Robillard. Recovering traceability links between an API and its learning resources. In Proc. 34rd ICSE, pages 47--57, 2012. Google Scholar
Digital Library
- S. de Souza, N. Anquetil, and K. de Oliveira. A study of the documentation essential to software maintenance. In Proc. 23rd SIGDOC, pages 68--75, 2005. Google Scholar
Digital Library
- U. Dekel and J. D. Herbsleb. Improving API documentation usability with knowledge pushing. In Proc. 31st ICSE, pages 320--330, 2009. Google Scholar
Digital Library
- E. Duala-Ekoko and M. P. Robillard. Asking and answering questions about unfamiliar APIs: An exploratory study. In Proc. 34rd ICSE, pages 266--276, June 2012. Google Scholar
Digital Library
- A. Fantechi, S. Gnesi, G. Lami, and A. Maccari. Applications of linguistic techniques for use case analysis. Requirement Engineering, 8(3):161--170, 2003.Google Scholar
Digital Library
- I. S. Fraser and L. M. Hodson. Twenty-one kicks at the grammar horse: Close-up: Grammar and composition. English journal, 67(9):49--54, 1978.Google Scholar
Cross Ref
- J. Gosling, B. Joy, G. Steele, and G. Bracha. The Java Language Specification, Java SE 7 Edition. 2012. Google Scholar
Digital Library
- A. Hindle, E. T. Barr, Z. Su, M. Gabel, and P. Devanbu. On the naturalness of software. In Proc. 34th ICSE, pages 837--847, 2012. Google Scholar
Digital Library
- E. W. Høst and B. M. Østvold. Debugging method names. In Proc. 23rd ECOOP, pages 294--317, 2009. Google Scholar
Digital Library
- J. Kim, S. Lee, S.-W. Hwang, and S. Kim. Enriching documents with examples: A corpus mining approach. ACM Transactions on Information Systems, 31(1):1, 2013. Google Scholar
Digital Library
- D. Klein and C. D. Manning. Accurate unlexicalized parsing. In Proc. 41st ACL, pages 423--430, 2003. Google Scholar
Digital Library
- L. Kof. Scenarios: Identifying missing objects and actions by means of computational linguistics. In Proc. 15th RE, pages 121--130, 2007.Google Scholar
Cross Ref
- V. Le, S. Gulwani, and Z. Su. SmartSynth: Synthesizing smartphone automation scripts from natural language. In MobiSys, to appear, 2013. Google Scholar
Digital Library
- T. C. Lethbridge, J. Singer, and A. Forward. How software engineers use documentation: The state of the practice. Software, IEEE, 20(6): 35--39, 2003. Google Scholar
Digital Library
- W. Maalej and M. P. Robillard. Patterns of knowledge in API reference documentation. IEEE Transactions on Software Engineering, to appear.Google Scholar
- T. Mens and T. Tourwé. A survey of software refactoring. IEEE Transactions on Software Engineering, 30(2):126--139, 2004. Google Scholar
Digital Library
- M. Miłkowski. Developing an open-source, rule-based proofreading tool. Software: Practice and Experience, 40(7):543--566, 2010. Google Scholar
Digital Library
- J. Nykaza, R. Messinger, F. Boehme, C. L. Norman, M. Mace, and M. Gordon. What programmers really want: results of a needs assessment for SDK documentation. In Proc. 20th SIGDOC, pages 133--141, 2002. Google Scholar
Digital Library
- N. Nystrom, M. Clarkson, and A. Myers. Polyglot: An extensible compiler framework for Java. Compiler Construction, 2622:138--152, 2003. Google Scholar
Digital Library
- R. Pandita, X. Xiao, H. Zhong, T. Xie, S. Oney, and A. Paradkar. Inferring method specifications from natural language API descriptions. In Proc. 34th ICSE, pages 815--825, 2012. Google Scholar
Digital Library
- R. Prieto-Díaz. Status report: Software reusability. Software, IEEE, 10(3):61--66, 1993. Google Scholar
Digital Library
- P. C. Rigby and M. P. Robillard. Discovering essential code elements in informal documentation. In Proc. 35th ICSE, page 11, 2013. Google Scholar
Digital Library
- M. P. Robillard and R. DeLine. A field study of API learning obstacles. Empirical Software Engineering, 16(6):703--732, 2011. Google Scholar
Digital Library
- P. Sawyer, P. Rayson, and R. Garside. REVERE: Support for requirements synthesis from documents. Information Systems Frontiers, 4(3): 343--353, 2002. Google Scholar
Digital Library
- D. Schreck, V. Dallmeier, and T. Zimmermann. How documentation evolves over time. In Proc. IWPSE, pages 4--10, 2007. Google Scholar
Digital Library
- F. Sebastiani. Machine learning in automated text categorization. ACM computing surveys, 34(1):1--47, 2002. Google Scholar
Digital Library
- L. Shi, H. Zhong, T. Xie, and M. Li. An empirical study on evolution of API documentation. In Proc. FASE, pages 416--431, 2011. Google Scholar
Digital Library
- N. Synytskyy, J. R. Cordy, and T. R. Dean. Robust multilingual parsing using island grammars. In Proc. CASCON, pages 266--278, 2003. Google Scholar
Digital Library
- L. Tan, D. Yuan, G. Krishna, and Y. Zhou. /* iComment: Bugs or Bad Comments?*/. In Proc. 21st SOSP, pages 145--158, 2007. Google Scholar
Digital Library
- S. H. Tan, D. Marinov, L. Tan, and G. T. Leavens. @tComment: Testing Javadoc comments to detect comment-code inconsistencies. In Proc. 5th ICST, pages 260--269, 2012. Google Scholar
Digital Library
- X. Xiao, A. Paradkar, S. Thummalapenta, and T. Xie. Automated extraction of security policies from natural-language software documents. In Proc. 20th FSE, pages 12:1--12:11, 2012. Google Scholar
Digital Library
- H. Zhong, T. Xie, L. Zhang, J. Pei, and H. Mei. MAPO: Mining and recommending API usage patterns. In Proc. 23rd ECOOP, pages 318--343, 2009. Google Scholar
Digital Library
- H. Zhong, L. Zhang, T. Xie, and H. Mei. Inferring resource specifications from natural language API documentation. In Proc. 24th ASE, pages 307--318, 2009. Google Scholar
Digital Library
Index Terms
Detecting API documentation errors
Recommendations
Detecting API documentation errors
OOPSLA '13: Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applicationsWhen programmers encounter an unfamiliar API library, they often need to refer to its documentations, tutorials, or discussions on development forums to learn its proper usage. These API documents contain valuable information, but may also mislead ...
Live API documentation
ICSE 2014: Proceedings of the 36th International Conference on Software EngineeringApplication Programming Interfaces (APIs) provide powerful abstraction mechanisms that enable complex functionality to be used by client programs. However, this abstraction does not come for free: understanding how to use an API can be difficult. While ...
API documentation from source code comments: a case study of Javadoc
SIGDOC '99: Proceedings of the 17th annual international conference on Computer documentationThis paper describes in a general way the process we went through to determine the goals, principles, audience, content and style for writing comments in source code for the Java platform at the Java Software division of Sun Microsystems. This includes ...







Comments