skip to main content

ExceLint: automatically finding spreadsheet formula errors

Published:24 October 2018Publication History
Related Artifact: ExceLint software artifact software https://doi.org/10.1145/3276934
Skip Abstract Section

Abstract

Spreadsheets are one of the most widely used programming environments, and are widely deployed in domains like finance where errors can have catastrophic consequences. We present a static analysis specifically designed to find spreadsheet formula errors. Our analysis directly leverages the rectangular character of spreadsheets. It uses an information-theoretic approach to identify formulas that are especially surprising disruptions to nearby rectangular regions. We present ExceLint, an implementation of our static analysis for Microsoft Excel. We demonstrate that ExceLint is fast and effective: across a corpus of 70 spreadsheets, ExceLint takes a median of 8 seconds per spreadsheet, and it significantly outperforms the state of the art analysis.

Skip Supplemental Material Section

Supplemental Material

a148-barrow.webm

References

  1. Robin Abraham and Martin Erwig. 2004. Header and unit inference for spreadsheets through spatial analyses. In Visual Languages and Human Centric Computing, 2004 IEEE Symposium on. IEEE, 165–172. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Rui Abreu, Simon Außerlechner, Birgit Hofer, and Franz Wotawa. 2015a. Testing for Distinguishing Repair Candidates in Spreadsheets - the Mussco Approach. In Testing Software and Systems - 27th IFIP WG 6.1 International Conference, ICTSS 2015, Sharjah and Dubai, United Arab Emirates, November 23-25, 2015, Proceedings. 124–140. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Abreu, J. Cunha, J. P. Fernandes, P. Martins, A. Perez, and J. Saraiva. 2014. Smelling Faults in Spreadsheets. In 2014 IEEE International Conference on Software Maintenance and Evolution. 111–120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Rui Abreu, Birgit Hofer, Alexandre Perez, and Franz Wotawa. 2015b. Using constraints to diagnose faulty spreadsheets. Software Quality Journal 23, 2 (2015), 297–322. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Yanif Ahmad, Tudor Antoniu, Sharon Goldwater, and Shriram Krishnamurthi. 2003. A Type System for Statically Detecting Spreadsheet Errors. In ASE. IEEE Computer Society, 174–183. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Abdussalam Alawini, David Maier, Kristin Tufte, Bill Howe, and Rashmi Nandikur. 2015. Towards Automated Prediction of Relationships Among Scientific Datasets. In Proceedings of the 27th International Conference on Scientific and Statistical Database Management (SSDBM ’15). ACM, New York, NY, USA, Article 35, 5 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Tudor Antoniu, Paul A. Steckler, Shriram Krishnamurthi, Erich Neuwirth, and Matthias Felleisen. 2004. Validating the Unit Correctness of Spreadsheet Programs. In Proceedings of the 26th International Conference on Software Engineering (ICSE ’04). IEEE Computer Society, Washington, DC, USA, 439–448. http://dl.acm.org/citation.cfm?id =998675.999448 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Titus Barik, Kevin Lubick, Justin Smith, John Slankas, and Emerson R. Murphy-Hill. 2015. Fuse: A Reproducible, Extendable, Internet-Scale Corpus of Spreadsheets. In 12th IEEE/ACM Working Conference on Mining Software Repositories, MSR 2015, Florence, Italy, May 16-17, 2015. 486–489. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Daniel W. Barowy, Emery D. Berger, and Benjamin Zorn. 2018. ExceLint repository. https://github.com/excelint/excelint. (2018).Google ScholarGoogle Scholar
  10. Daniel W. Barowy, Dimitar Gochev, and Emery D. Berger. 2014. CheckCell: Data Debugging for Spreadsheets. In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications (OOPSLA ’14). ACM, New York, NY, USA, 507–523. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Daniel W. Barowy, Sumit Gulwani, Ted Hart, and Benjamin Zorn. 2015. FlashRelate: Extracting Relational Data from Semistructured Spreadsheets Using Examples. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’15). ACM, New York, NY, USA, 218–228. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Michael Batty. 1974. Spatial Entropy. Geographical Analysis 6, 1 (1974), 1–31.Google ScholarGoogle ScholarCross RefCross Ref
  13. Al Bessey, Ken Block, Ben Chelf, Andy Chou, Bryan Fulton, Seth Hallem, Charles Henri-Gros, Asya Kamsky, Scott McPeak, and Dawson Engler. 2010. A Few Billion Lines of Code Later: Using Static Analysis to Find Bugs in the Real World. Commun. ACM 53, 2 (Feb. 2010), 66–75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Jeffrey Carver, Marc Fisher, II, and Gregg Rothermel. 2006. An empirical evaluation of a testing and debugging methodology for Excel. In Proceedings of the 2006 ACM/IEEE international symposium on Empirical software engineering (ISESE ’06). ACM, New York, NY, USA, 278–287. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Chris Chambers and Martin Erwig. 2010. Reasoning about spreadsheets with labels and dimensions. J. Vis. Lang. Comput. 21, 5 (Dec. 2010), 249–262. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J.P. Morgan Chase and Co. 2013. Report of JPMorgan Chase and Co. Management Task Force Regarding 2012 CIO Losses. (16 Jan. 2013). http://files.shareholder.com/downloads/ONE/5509659956x0x628656/4cb574a0-0bf5-4728-9582-625e4519b5ab/ Task F orce R eport.pdfGoogle ScholarGoogle Scholar
  17. Shing-Chi Cheung, Wanjun Chen, Yepang Liu, and Chang Xu. 2016. CUSTODES: Automatic Spreadsheet Cell Clustering and Smell Detection using Strong and Weak Features. In Proceedings of ICSE ’16. to appear. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Trishul M. Chilimbi and Vinod Ganapathy. 2006. HeapMD: Identifying Heap-based Bugs Using Anomaly Detection. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XII). ACM, New York, NY, USA, 219–228. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Keith D. Cooper and Linda Torczon. 2005. Engineering a Compiler. Morgan Kaufmann.Google ScholarGoogle Scholar
  20. Martin Dimitrov and Huiyang Zhou. 2009. Anomaly-based Bug Prediction, Isolation, and Validation: An Automated Approach for Software Debugging. In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XIV). ACM, New York, NY, USA, 61–72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Wensheng Dou, Shing-Chi Cheung, and Jun Wei. 2014. Is spreadsheet ambiguity harmful? detecting and repairing spreadsheet smells due to ambiguous computation. In Proceedings of the 36th International Conference on Software Engineering. ACM, 848–858. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Dawson Engler, David Yu Chen, Seth Hallem, Andy Chou, and Benjamin Chelf. 2001. Bugs As Deviant Behavior: A General Approach to Inferring Errors in Systems Code. In Proceedings of the Eighteenth ACM Symposium on Operating Systems Principles (SOSP ’01). ACM, New York, NY, USA, 57–72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Martin Erwig. 2009. Software Engineering for Spreadsheets. IEEE Softw. 26, 5 (Sept. 2009), 25–30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Martin Erwig, Robin Abraham, Irene Cooperstein, and Steve Kollmansberger. 2005. Automatic generation and maintenance of correct spreadsheets. In ICSE (ICSE ’05). ACM, New York, NY, USA, 136–145. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Martin Erwig and Margaret Burnett. 2002. Adding apples and oranges. In Practical Aspects of Declarative Languages. Springer, 173–191. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Marc Fisher and Gregg Rothermel. 2005. The EUSES spreadsheet corpus: a shared resource for supporting experimentation with spreadsheet dependability mechanisms. SIGSOFT Softw. Eng. Notes (July 2005). Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. M. Fisher, G. Rothermel, T. Creelan, and M. Burnett. 2006. Scaling a Dataflow Testing Methodology to the Multiparadigm World of Commercial Spreadsheets. In 17th International Symposium on Software Reliability Engineering (ISSRE’06). IEEE, 13–22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Mary Jo Foley. 2010. About that 1 billion Microsoft Office figure ... http://www.zdnet.com/article/about-that-1-billionmicrosoft-office-figure . (16 June 2010).Google ScholarGoogle Scholar
  29. Valentina I. Grigoreanu, Margaret M. Burnett, and George G. Robertson. 2010. A Strategy-centric Approach to the Design of End-user Debugging Tools. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’10). ACM, New York, NY, USA, 713–722. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Sudheendra Hangal and Monica S. Lam. 2002. Tracking Down Software Bugs Using Automatic Anomaly Detection. In Proceedings of the 24th International Conference on Software Engineering (ICSE ’02). ACM, New York, NY, USA, 291–301. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Felienne Hermans and Danny Dig. 2014. BumbleBee: A Refactoring Environment for Spreadsheet Formulas. In Proceedings of the 22Nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2014). ACM, New York, NY, USA, 747–750. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Felienne Hermans, Martin Pinzger, and Arie van Deursen. 2012a. Detecting and Visualizing Inter-worksheet Smells in Spreadsheets. In Proceedings of the 34th International Conference on Software Engineering (ICSE ’12). IEEE Press, Piscataway, NJ, USA, 441–451. http://dl.acm.org/citation.cfm?id =2337223.2337275 Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Felienne Hermans, Martin Pinzger, and Arie van Deursen. 2010. Automatically Extracting Class Diagrams from Spreadsheets. In Proceedings of the 24th European Conference on Object-oriented Programming (ECOOP’10). Springer-Verlag, Berlin, Heidelberg, 52–75. http://dl.acm.org/citation.cfm?id =1883978.1883984 Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Felienne Hermans, Martin Pinzger, and Arie van Deursen. 2012b. Detecting code smells in spreadsheet formulas. In Software Maintenance (ICSM), 2012 28th IEEE International Conference on. IEEE, 409–418. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Felienne Hermans, Martin Pinzger, and Arie van Deursen. 2015. Detecting and refactoring code smells in spreadsheet formulas. Empirical Software Engineering 20, 2 (01 Apr 2015), 549–575. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Felienne Hermans, Ben Sedee, Martin Pinzger, and Arie van Deursen. 2013. Data Clone Detection and Visualization in Spreadsheets. In Proceedings of the 2013 International Conference on Software Engineering (ICSE ’13). IEEE Press, Piscataway, NJ, USA, 292–301. http://dl.acm.org/citation.cfm?id =2486788.2486827 Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Thomas Herndon, Michael Ash, and Robert Pollin. 2013. Does High Public Debt Consistently Stifle Economic Growth? A Critique of Reinhart and Rogoff. Working Paper Series 322. Political Economy Research Institute, University of Massachusetts Amherst. http://www.peri.umass.edu/fileadmin/pdf/working p apers/working p apers 3 01-350/WP322.pdfGoogle ScholarGoogle Scholar
  38. Birgit Hofer, Andrea Hofler, and Franz Wotawa. 2017. Combining Models for Improved Fault Localization in Spreadsheets. IEEE Trans. Reliability 66, 1 (2017), 38–53.Google ScholarGoogle ScholarCross RefCross Ref
  39. Birgit Hofer, Alexandre Perez, Rui Abreu, and Franz Wotawa. 2015. On the empirical evaluation of similarity coefficients for spreadsheets fault localization. Autom. Softw. Eng. 22, 1 (2015), 47–74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Birgit Hofer, André Riboira, Franz Wotawa, Rui Abreu, and Elisabeth Getzner. 2013. On the empirical evaluation of fault localization techniques for spreadsheets. In Proceedings of the 16th international conference on Fundamental Approaches to Software Engineering (FASE’13). Springer-Verlag, Berlin, Heidelberg, 68–82. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Dietmar Jannach, Thomas Schmitz, Birgit Hofer, and Franz Wotawa. 2014. Avoiding, finding and fixing spreadsheet errors - A survey of automated approaches for spreadsheet QA. Journal of Systems and Software 94 (2014), 129–150.Google ScholarGoogle Scholar
  42. Nima Joharizadeh. 2015. Finding Bugs in Spreadsheets Using Reference Counting. In Companion Proceedings of the 2015 ACM SIGPLAN International Conference on Systems, Programming, Languages and Applications: Software for Humanity (SPLASH Companion 2015). ACM, New York, NY, USA, 73–74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Andrew J. Ko, Robin Abraham, Laura Beckwith, Alan Blackwell, Margaret Burnett, Martin Erwig, Chris Scaffidi, Joseph Lawrance, Henry Lieberman, Brad Myers, Mary Beth Rosson, Gregg Rothermel, Mary Shaw, and Susan Wiedenbeck. 2011. The state of the art in end-user software engineering. ACM Comput. Surv. 43, 3, Article 21 (April 2011), 44 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Vu Le and Sumit Gulwani. 2014. FlashExtract: A Framework for Data Extraction by Examples. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’14). ACM, New York, NY, USA, 542–553. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Gaspard Monge. 1781. Mémoire sur la théorie des déblais et des remblais. Histoire de l’Académie Royale des Sciences (1781), 666–704.Google ScholarGoogle Scholar
  46. Kıvanç Muşlu, Yuriy Brun, and Alexandra Meliou. 2015. Preventing Data Errors with Continuous Testing. In Proceedings of the 2015 International Symposium on Software Testing and Analysis (ISSTA 2015). ACM, New York, NY, USA, 373–384. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Ray Panko. 2015. What We Don’t Know About Spreadsheet Errors Today: The Facts, Why We Don’t Believe Them, and What We Need to Do. In The European Spreadsheet Risks Interest Group 16th Annual Conference (EuSpRiG 2015). EuSpRiG.Google ScholarGoogle Scholar
  48. Raymond R. Panko. 1998. What we know about spreadsheet errors. Journal of End User Computing 10 (1998), 15–21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Spencer Pearson, José Campos, René Just, Gordon Fraser, Rui Abreu, Michael D. Ernst, Deric Pang, and Benjamin Keller. 2017. Evaluating and Improving Fault Localization. In Proceedings of the 39th International Conference on Software Engineering (ICSE ’17). IEEE Press, Piscataway, NJ, USA, 609–620. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. J. R. Quinlan. 1986. Induction of Decision Trees. MACH. LEARN 1 (1986), 81–106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Orna Raz, Philip Koopman, and Mary Shaw. 2002. Semantic anomaly detection in online data sources. In ICSE (ICSE ’02). ACM, New York, NY, USA, 302–312. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Carmen M. Reinhart and Kenneth S. Rogoff. 2010. Growth in a Time of Debt. Working Paper 15639. National Bureau of Economic Research. http://www.nber.org/papers/w15639Google ScholarGoogle Scholar
  53. G. Rothermel, M. Burnett, L. Li, C. Dupuis, and A. Sheretov. 2001. A methodology for testing spreadsheets. ACM Transactions on Software Engineering and Methodology (TOSEM) 10, 1 (2001), 110–147. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. G. Rothermel, L. Li, C. DuPuis, and M. Burnett. 1998. What you see is what you test: A methodology for testing form-based visual programs. In ICSE 1998. IEEE, 198–207. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Thomas Schmitz, Dietmar Jannach, Birgit Hofer, Patrick W. Koch, Konstantin Schekotihin, and Franz Wotawa. 2017. A decomposition-based approach to spreadsheet testing and debugging. In 2017 IEEE Symposium on Visual Languages and Human-Centric Computing, VL/HCC 2017, Raleigh, NC, USA, October 11-14, 2017. 117–121.Google ScholarGoogle ScholarCross RefCross Ref
  56. C. E. Shannon. 1948. A mathematical theory of communication. Bell system technical journal 27 (1948).Google ScholarGoogle Scholar
  57. Rishabh Singh, Benjamin Livshits, and Ben Zorn. 2017. Melford: Using Neural Networks to Find Spreadsheet Errors. Technical Report. https://www.microsoft.com/en-us/research/publication/melford-using-neural-networks-find-spreadsheeterrors/Google ScholarGoogle Scholar
  58. Peter Wegner. 1960. A Technique for Counting Ones in a Binary Computer. Commun. ACM 3, 5 (May 1960), 322–. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. D. J. A. Welsh and M. B. Powell. 1967. An upper bound for the chromatic number of a graph and its application to timetabling problems. Comput. J. 10, 1 (1967), 85–86.Google ScholarGoogle Scholar
  60. Yichen Xie and Dawson Engler. 2002. Using Redundancies to Find Errors. In IEEE Transactions on Software Engineering. 51–60. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. ExceLint: automatically finding spreadsheet formula errors

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!