skip to main content
research-article

Detecting API documentation errors

Authors Info & Claims
Published:29 October 2013Publication History
Skip Abstract Section

Abstract

When programmers encounter an unfamiliar API library, they often need to refer to its documentations, tutorials, or discussions on development forums to learn its proper usage. These API documents contain valuable information, but may also mislead programmers as they may contain errors (e.g., broken code names and obsolete code samples). Although most API documents are actively maintained and updated, studies show that many new and latent errors do exist. It is tedious and error-prone to find such errors manually as API documents can be enormous with thousands of pages. Existing tools are ineffective in locating documentation errors because traditional natural language (NL) tools do not understand code names and code samples, and traditional code analysis tools do not understand NL sentences. In this paper, we propose the first approach, DOCREF, specifically designed and developed to detect API documentation errors. We formulate a class of inconsistencies to indicate potential documentation errors, and combine NL and code analysis techniques to detect and report such inconsistencies. We have implemented DOCREF and evaluated its effectiveness on the latest documentations of five widely-used API libraries. DOCREF has detected more than 1,000 new documentation errors, which we have reported to the authors. Many of the errors have already been confirmed and fixed, after we reported them.

References

  1. A. Bacchelli, M. D'Ambros, and M. Lanza. Extracting source code from e-mails. In Proc. 18th ICPC, pages 24--33, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Bacchelli, M. Lanza, and R. Robbes. Linking e-mails and source code artifacts. In Proc. 32nd ICSE, pages 375--384, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Bacchelli, T. Dal Sasso, M. D'Ambros, and M. Lanza. Content classification of development emails. In Proc. 34th ICSE, pages 375--385, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Buse and W. Weimer. Automatic documentation inference for exceptions. In Proc. ISSTA, pages 273--282, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. B. Carpenter and B. Baldwin. Text analysis with LingPipe 4. LingPipe Inc, 2011.Google ScholarGoogle Scholar
  6. C. E. Chaski. Empirical evaluations of language-based author identification techniques. Forensic Linguistics, 8:1--65, 2001.Google ScholarGoogle Scholar
  7. B. Dagenais and L. J. Hendren. Enabling static analysis for partial Java programs. In Proc. 23rd OOPSLA, pages 313--328, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. B. Dagenais and M. P. Robillard. Creating and evolving developer documentation: understanding the decisions of open source contributors. In Proc. 18th FSE, pages 127--136, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. B. Dagenais and M. P. Robillard. Recovering traceability links between an API and its learning resources. In Proc. 34rd ICSE, pages 47--57, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. de Souza, N. Anquetil, and K. de Oliveira. A study of the documentation essential to software maintenance. In Proc. 23rd SIGDOC, pages 68--75, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. U. Dekel and J. D. Herbsleb. Improving API documentation usability with knowledge pushing. In Proc. 31st ICSE, pages 320--330, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. E. Duala-Ekoko and M. P. Robillard. Asking and answering questions about unfamiliar APIs: An exploratory study. In Proc. 34rd ICSE, pages 266--276, June 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Fantechi, S. Gnesi, G. Lami, and A. Maccari. Applications of linguistic techniques for use case analysis. Requirement Engineering, 8(3):161--170, 2003.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. I. S. Fraser and L. M. Hodson. Twenty-one kicks at the grammar horse: Close-up: Grammar and composition. English journal, 67(9):49--54, 1978.Google ScholarGoogle ScholarCross RefCross Ref
  15. J. Gosling, B. Joy, G. Steele, and G. Bracha. The Java Language Specification, Java SE 7 Edition. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Hindle, E. T. Barr, Z. Su, M. Gabel, and P. Devanbu. On the naturalness of software. In Proc. 34th ICSE, pages 837--847, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. E. W. Høst and B. M. Østvold. Debugging method names. In Proc. 23rd ECOOP, pages 294--317, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Kim, S. Lee, S.-W. Hwang, and S. Kim. Enriching documents with examples: A corpus mining approach. ACM Transactions on Information Systems, 31(1):1, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. D. Klein and C. D. Manning. Accurate unlexicalized parsing. In Proc. 41st ACL, pages 423--430, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. L. Kof. Scenarios: Identifying missing objects and actions by means of computational linguistics. In Proc. 15th RE, pages 121--130, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  21. V. Le, S. Gulwani, and Z. Su. SmartSynth: Synthesizing smartphone automation scripts from natural language. In MobiSys, to appear, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. T. C. Lethbridge, J. Singer, and A. Forward. How software engineers use documentation: The state of the practice. Software, IEEE, 20(6): 35--39, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. W. Maalej and M. P. Robillard. Patterns of knowledge in API reference documentation. IEEE Transactions on Software Engineering, to appear.Google ScholarGoogle Scholar
  24. T. Mens and T. Tourwé. A survey of software refactoring. IEEE Transactions on Software Engineering, 30(2):126--139, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M. Miłkowski. Developing an open-source, rule-based proofreading tool. Software: Practice and Experience, 40(7):543--566, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Nykaza, R. Messinger, F. Boehme, C. L. Norman, M. Mace, and M. Gordon. What programmers really want: results of a needs assessment for SDK documentation. In Proc. 20th SIGDOC, pages 133--141, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. N. Nystrom, M. Clarkson, and A. Myers. Polyglot: An extensible compiler framework for Java. Compiler Construction, 2622:138--152, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. R. Pandita, X. Xiao, H. Zhong, T. Xie, S. Oney, and A. Paradkar. Inferring method specifications from natural language API descriptions. In Proc. 34th ICSE, pages 815--825, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. R. Prieto-Díaz. Status report: Software reusability. Software, IEEE, 10(3):61--66, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. P. C. Rigby and M. P. Robillard. Discovering essential code elements in informal documentation. In Proc. 35th ICSE, page 11, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. M. P. Robillard and R. DeLine. A field study of API learning obstacles. Empirical Software Engineering, 16(6):703--732, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. P. Sawyer, P. Rayson, and R. Garside. REVERE: Support for requirements synthesis from documents. Information Systems Frontiers, 4(3): 343--353, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. D. Schreck, V. Dallmeier, and T. Zimmermann. How documentation evolves over time. In Proc. IWPSE, pages 4--10, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. F. Sebastiani. Machine learning in automated text categorization. ACM computing surveys, 34(1):1--47, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. L. Shi, H. Zhong, T. Xie, and M. Li. An empirical study on evolution of API documentation. In Proc. FASE, pages 416--431, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. N. Synytskyy, J. R. Cordy, and T. R. Dean. Robust multilingual parsing using island grammars. In Proc. CASCON, pages 266--278, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. L. Tan, D. Yuan, G. Krishna, and Y. Zhou. /* iComment: Bugs or Bad Comments?*/. In Proc. 21st SOSP, pages 145--158, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. S. H. Tan, D. Marinov, L. Tan, and G. T. Leavens. @tComment: Testing Javadoc comments to detect comment-code inconsistencies. In Proc. 5th ICST, pages 260--269, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. X. Xiao, A. Paradkar, S. Thummalapenta, and T. Xie. Automated extraction of security policies from natural-language software documents. In Proc. 20th FSE, pages 12:1--12:11, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. H. Zhong, T. Xie, L. Zhang, J. Pei, and H. Mei. MAPO: Mining and recommending API usage patterns. In Proc. 23rd ECOOP, pages 318--343, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. H. Zhong, L. Zhang, T. Xie, and H. Mei. Inferring resource specifications from natural language API documentation. In Proc. 24th ASE, pages 307--318, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Detecting API documentation errors

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM SIGPLAN Notices
          ACM SIGPLAN Notices  Volume 48, Issue 10
          OOPSLA '13
          October 2013
          867 pages
          ISSN:0362-1340
          EISSN:1558-1160
          DOI:10.1145/2544173
          Issue’s Table of Contents
          • cover image ACM Conferences
            OOPSLA '13: Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications
            October 2013
            904 pages
            ISBN:9781450323741
            DOI:10.1145/2509136

          Copyright © 2013 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 29 October 2013

          Check for updates

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!