skip to main content
10.1145/3510003.3510102acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Automated patching for unreproducible builds

Authors Info & Claims
Published:05 July 2022Publication History

ABSTRACT

Software reproducibility plays an essential role in establishing trust between source code and the built artifacts, by comparing compilation outputs acquired from independent users. Although the testing for unreproducible builds could be automated, fixing unreproducible build issues poses a set of challenges within the reproducible builds practice, among which we consider the localization granularity and the historical knowledge utilization as the most significant ones. To tackle these challenges, we propose a novel approach RepFix that combines tracing-based fine-grained localization with history-based patch generation mechanisms.

On the one hand, to tackle the localization granularity challenge, we adopt system-level dynamic tracing to capture both the system call traces and user-space function call information. By integrating the kernel probes and user-space probes, we could determine the location of each executed build command more accurately. On the other hand, to tackle the historical knowledge utilization challenge, we design a similarity based relevant patch retrieving mechanism, and generate patches by applying the edit operations of the existing patches. With the abundant patches accumulated by the reproducible builds practice, we could generate patches to fix the unreproducible builds automatically.

To evaluate the usefulness of RepFix, extensive experiments are conducted over a dataset with 116 real-world packages. Based on RepFix, we successfully fix the unreproducible build issues for 64 packages. Moreover, we apply RepFix to the Arch Linux packages, and successfully fix four packages. Two patches have been accepted by the repository, and there is one package for which the patch is pushed and accepted by its upstream repository, so that the fixing could be helpful for other downstream repositories.

References

  1. 2021. About Event Tracing. https://docs.microsoft.com/en-us/windows/win32/etw/about-event-tracing. Accessed: 2021-09-02.Google ScholarGoogle Scholar
  2. 2021. DTrace. http://dtrace.org. Accessed: 2021-09-01.Google ScholarGoogle Scholar
  3. 2021. FS#69535: when. https://bugs.archlinux.org/task/69535. Accessed: 2021-09-02.Google ScholarGoogle Scholar
  4. 2021. FS#70302: pythia8. https://bugs.archlinux.org/task/70302. Accessed: 2021-09-02.Google ScholarGoogle Scholar
  5. 2021. FS#70303: zssh. https://bugs.archlinux.org/task/70303. Accessed: 2021-09-02.Google ScholarGoogle Scholar
  6. 2021. FS#71953: dd_rescue. https://bugs.archlinux.org/task/71953. Accessed: 2021-09-02.Google ScholarGoogle Scholar
  7. 2021. GNU gzip: General file (de)compression. https://www.gnu.org/software/gzip/manual/gzip.html. Accessed: 2021-04-3.Google ScholarGoogle Scholar
  8. 2021. History of reproducible builds. https://reproducible-builds.org/docs/history/. Accessed: 2021-08-30.Google ScholarGoogle Scholar
  9. 2021. mylvmbackup. https://tracker.debian.org/pkg/mylvmbackup. Accessed: 2022-02-01.Google ScholarGoogle Scholar
  10. 2021. ptrace(2) Linux manual page. https://man7.org/linux/man-pages/man2/ptrace.2.html. Accessed: 2021-08-21.Google ScholarGoogle Scholar
  11. 2021. The SOURCE_DATE_EPOCH specification. https://reproducible-builds.org/docs/source-date-epoch/. Accessed: 2021-09-02.Google ScholarGoogle Scholar
  12. 2021. Strace. https://strace.io. Accessed: 2021-04-17.Google ScholarGoogle Scholar
  13. Muhammad Asaduzzaman, Chanchal K Roy, Kevin A Schneider, and Massimiliano Di Penta. 2013. Lhdiff: A language-independent hybrid approach for tracking source code lines. In 2013 IEEE International Conference on Software Maintenance (ICSM). IEEE, 230--239.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Raymond Chen. 2018. Why are the module timestamps in Windows 10 so nonsensical? https://devblogs.microsoft.com/oldnewthing/20180103-00/?p=97705. Accessed: 2021-08-31.Google ScholarGoogle Scholar
  15. Zhe Chen, Junqi Yan, Shuanglong Kan, Ju Qian, and Jingling Xue. 2019. Detecting memory errors at runtime with source-level instrumentation. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis. 341--351.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Domenico Cotroneo, Luigi De Simone, and Roberto Natella. 2018. Run-time detection of protocol bugs in storage I/O device drivers. IEEE Transactions on Reliability 67, 3 (2018), 847--869.Google ScholarGoogle ScholarCross RefCross Ref
  17. Nick Craswell. 2009. Mean Reciprocal Rank. In Encyclopedia of Database Systems, Ling Liu and M. Tamer Özsu (Eds.). Springer US, Boston, MA, 1703--1703.Google ScholarGoogle Scholar
  18. Jake Edge. 2017. Reproducible builds. https://lwn.net/Articles/719823/. Accessed: 2021-08-27.Google ScholarGoogle Scholar
  19. Mohamed Elsabagh, Daniel Barbará, Dan Fleck, and Angelos Stavrou. 2018. On early detection of application-level resource exhaustion and starvation. Journal of Systems and Software 137 (2018), 430--447.Google ScholarGoogle ScholarCross RefCross Ref
  20. Paul Gazzillo. 2017. Kmax: Finding All Configurations of Kbuild Makefiles Statically. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering (ESEC/FSE) (Paderborn, Germany) (ESEC/FSE 2017). ACM, 279--290.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Michael Greenberg, Konstantinos Kallas, and Nikos Vasilakis. 2021. Unix shell programming: the next 50 years. In Proceedings of the Workshop on Hot Topics in Operating Systems. 104--111.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Jiatao Gu, Changhan Wang, and Junbo Zhao. 2019. Levenshtein Transformer. Advances in Neural Information Processing Systems 32 (2019), 11181--11191.Google ScholarGoogle Scholar
  23. Foyzul Hassan and Xiaoyin Wang. 2018. Hirebuild: An automatic approach to history-driven repair of build scripts. In 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE). IEEE, 1078--1089.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Hongjun He, Jicheng Cao, Lesheng Du, Hao Li, Shilong Wang, and Shengyu Cheng. 2020. ConstBin: A Tool for Automatic Fixing of Unreproducible Builds. In 2020 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW). IEEE, 97--102.Google ScholarGoogle Scholar
  25. Jingzhu He, Ting Dai, and Xiaohui Gu. 2018. Tscope: Automatic timeout bug identification for server systems. In 2018 IEEE International Conference on Autonomic Computing (ICAC). IEEE, 1--10.Google ScholarGoogle ScholarCross RefCross Ref
  26. Ryan Hurst. 2021. Verifiable design in modern systems. https://security.googleblog.com/2021/07/verifiable-design-in-modern-systems.html. Accessed: 2021-08-31.Google ScholarGoogle Scholar
  27. Pascal Jungblut, Roger Kowalewski, and Karl Fürlinger. 2018. Source-to-Source Instrumentation for Profiling Runtime Behavior of C++ Containers. In 2018 IEEE 20th International Conference on High Performance Computing and Communications (HPCC). IEEE, 948--953.Google ScholarGoogle ScholarCross RefCross Ref
  28. Chris Lamb and Stefano Zacchiroli. 2021. Reproducible Builds: Increasing the Integrity of Software Supply Chains. IEEE Software (2021). Early Access. Google ScholarGoogle ScholarCross RefCross Ref
  29. Xia Li, Wei Li, Yuqun Zhang, and Lingming Zhang. 2019. Deepfl: Integrating multiple fault diagnosis dimensions for deep fault localization. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA). 169--180.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Nándor Licker and Andrew Rice. 2019. Detecting incorrect build rules. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). IEEE, 1234--1244.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Chang Liu, Zhengong Cai, Bingshen Wang, Zhimin Tang, and Jiaxu Liu. 2020. A protocol-independent container network observability analysis system based on eBPF. In 2020 IEEE 26th International Conference on Parallel and Distributed Systems (ICPADS). IEEE, 697--702.Google ScholarGoogle ScholarCross RefCross Ref
  32. Kui Liu, Shangwen Wang, Anil Koyuncu, Kisub Kim, Tegawendé F Bissyandé, Dongsun Kim, Peng Wu, Jacques Klein, Xiaoguang Mao, and Yves Le Traon. 2020. On the efficiency of test suite based program repair: A systematic assessment of 16 automated repair systems for java programs. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering (ICSE). 615--627.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Yiling Lou, Junjie Chen, Lingming Zhang, DanHao, and Lu Zhang. 2019. History-driven build failure fixing: how far are we?. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA). 43--54.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Yiling Lou, Zhenpeng Chen, Yanbin Cao, Dan Hao, and Lu Zhang. 2020. Understanding Build Issue Resolution in Practice: Symptoms and Fix Patterns. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE) (Virtual Event, USA). ACM, 617--628.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Douglas H Martin, James R Cordy, Bram Adams, and Giulio Antoniol. 2015. Make it simple-an empirical analysis of gnu make feature use in open source projects. In 2015 IEEE 23rd International Conference on Program Comprehension. IEEE, 207--217.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Omar S Navarro Leija, Kelly Shiptoski, Ryan G Scott, Baojun Wang, Nicholas Renner, Ryan R Newton, and Joseph Devietti. 2020. Reproducible Containers. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 167--182.Google ScholarGoogle Scholar
  37. Zhilei Ren, He Jiang, Jifeng Xuan, and Zijiang Yang. 2018. Automated localization for unreproducible builds. In Proceedings of the 40th International Conference on Software Engineering (ICSE). 71--81.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Zhilei Ren, Changlin Liu, Xusheng Xiao, He Jiang, and Tao Xie. 2019. Root cause localization for unreproducible builds via causality analysis over system call tracing. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 527--538.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Young Shi, Mingzhi Wen, Filipe Roseiro Cogo, Boyuan Chen, and Zhen Ming Jack Jiang. 2021. An Experience Report on Producing Verifiable Builds for Large-Scale Commercial Systems. IEEE Transactions on Software Engineering (2021). Google ScholarGoogle ScholarCross RefCross Ref
  40. Paul D. Smith. 2004. Makefile grammar. https://www.mail-archive.com/[email protected]/msg02778.html. Accessed: 2021-08-23.Google ScholarGoogle Scholar
  41. Thodoris Sotiropoulos, Stefanos Chaliasos, Dimitris Mitropoulos, and Diomidis Spinellis. 2020. A Model for Detecting Faults in Build Specifications. Proc. ACM Program. Lang. 4, OOPSLA, Article 144 (Nov. 2020), 30 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Ahmed Tamrawi, Hoan Anh Nguyen, Hung Viet Nguyen, and Tien N Nguyen. 2012. Build code analysis with symbolic evaluation. In 34th International Conference on Software Engineering (ICSE). IEEE, 650--660.Google ScholarGoogle ScholarCross RefCross Ref
  43. Xin Ye, Razvan Bunescu, and Chang Liu. 2014. Learning to rank relevant files for bug reports using domain knowledge. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 689--699.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Daming Zou, Jingjing Liang, Yingfei Xiong, Michael D. Ernst, and Lu Zhang. 2021. An Empirical Study of Fault Localization Families and Their Combinations. IEEE Transactions on Software Engineering 47, 2 (2021), 332--347.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Automated patching for unreproducible builds

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!