skip to main content
research-article
Open Access

Verifying quantitative reliability for programs that execute on unreliable hardware

Published:29 October 2013Publication History
Skip Abstract Section

Abstract

Emerging high-performance architectures are anticipated to contain unreliable components that may exhibit soft errors, which silently corrupt the results of computations. Full detection and masking of soft errors is challenging, expensive, and, for some applications, unnecessary. For example, approximate computing applications (such as multimedia processing, machine learning, and big data analytics) can often naturally tolerate soft errors.

We present Rely a programming language that enables developers to reason about the quantitative reliability of an application -- namely, the probability that it produces the correct result when executed on unreliable hardware. Rely allows developers to specify the reliability requirements for each value that a function produces.

We present a static quantitative reliability analysis that verifies quantitative requirements on the reliability of an application, enabling a developer to perform sound and verified reliability engineering. The analysis takes a Rely program with a reliability specification and a hardware specification that characterizes the reliability of the underlying hardware components and verifies that the program satisfies its reliability specification when executed on the underlying unreliable hardware platform. We demonstrate the application of quantitative reliability analysis on six computations implemented in Rely.

References

  1. J. Ansel, C. Chan, Y. L. Wong, M. Olszewski, Q. Zhao, A. Edelman, and Amarasinghe. PetaBricks: A language and compiler for algorithmic choice. PLDI, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. W. Baek and T. M. Chilimbi. Green: A framework for supporting energy-conscious programming using controlled approximation. PLDI, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. T. Bao, Y. Zheng, and X. Zhang. White box sampling in uncertain data processing enabled by program analysis. In OOPSLA, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. G. Barthe, D. Demange, and D. Pichardie. A formally verified ssa-based middle-end: Static single assignment meets compcert. ESOP, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. G. Barthe, B. Grégoire, and S. Zanella Béguelin. Formal certification of code-based cryptographic proofs. POPL, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. G. Barthe, B. Köpf, F. Olmedo, and S. Zanella Béguelin. Probabilistic reasoning for differential privacy. POPL, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Blum and S. Kanna. Designing programs that check their work. STOC, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Blum, M. Luby, and R. Rubinfeld. Self-testing/correcting with applications to numerical problems. Journal of computer and system sciences, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. F. Cappello, A. Geist, B. Gropp, L. Kale, B. Kramer, and M. Snir. Toward exascale resilience. International Journal of High Performance Computing Applications, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Carbin, D. Kim, S. Misailovic, and M. Rinard. Proving acceptability properties of relaxed nondeterministic approximate programs. PLDI, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Carbin, D. Kim, S. Misailovic, and M. Rinard. Verified integrity properties for safe approximate program transformations. PEPM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Carbin and M. Rinard. Automatically identifying critical input regions and code in applications. ISSTA, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. L. Chakrapani, B. Akgul, S. Cheemalavagu, P. Korkmaz, K. Palem, and B. Seshasayee. Ultra-efficient (embedded) soc architectures based on probabilistic cmos (pcmos) technology. DATE, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Chaudhuri, S. Gulwani, R. Lublinerman, and S. Navidpour. Proving programs robust. FSE, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. P. Cousot and M. Monerau. Probabilistic abstract interpretation. ESOP, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. de Kruijf, S. Nomura, and K. Sankaralingam. Relax: an architectural framework for software recovery of hardware faults. ISCA, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. Di Pierro and H. Wiklicky. Concurrent constraint programming: Towards probabilistic abstract interpretation. PPDP'00. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. E. W. Dijkstra. Guarded commands, nondeterminacy and formal derivation of programs. CACM, 18(8), August 1975. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. D. Ernst, N. S. Kim, S. Das, S. Pant, R. Rao, T. Pham, C. Ziesler, D. Blaauw, T. Austin, K. Flautner, and T. Mudge. Razor: A low-power pipeline based on circuit-level timing speculation. MICRO, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. H. Esmaeilzadeh, A. Sampson, L. Ceze, and D. Burger. Architecture support for disciplined approximate programming. ASPLOS, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. H. Esmaeilzadeh, A. Sampson, L. Ceze, and D. Burger. Neural acceleration for general-purpose approximate programs. MICRO, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. Feng, S. Gupta, A. Ansari, and S. Mahlke. Shoestring: probabilistic soft error reliability on the cheap. ASPLOS'10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. Filieri, C. P\uas\uareanu, and W. Visser. Reliability analysis in symbolic pathfinder. In ICSE, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M. Hiller, A. Jhumka, and N. Suri. On the placement of software mechanisms for detection of data errors. DSN, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. H. Hoffmann, S. Sidiroglou, M. Carbin, S. Misailovic, A. Agarwal, and M. Rinard. Dynamic knobs for responsive power-aware computing. ASPLOS, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. Kling, S. Misailovic, M. Carbin, and M. Rinard. Bolt: on-demand infinite loop escape in unmodified binaries. OOPSLA, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. K. Knobe and V. Sarkar. Array ssa form and its use in parallelization. POPL, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. D. Kozen. Semantics of probabilistic programs. Journal of Computer and System Sciences, 1981.Google ScholarGoogle ScholarCross RefCross Ref
  29. K. Lee, A. Shrivastava, I. Issenin, N. Dutt, and N. Venkatasubramanian. Mitigating soft error failures for multimedia applications by selective data protection. CASES, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. L. Leem, H. Cho, J. Bau, Q. Jacobson, and S. Mitra. Ersa: error resilient system architecture for probabilistic applications. DATE, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. N. Leveson, S. Cha, J. C. Knight, and T. Shimeall. The use of self checks and voting in software error detection: An empirical study. IEEE TSE, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. N. Leveson and P. Harvey. Software fault tree analysis. Journal of Systems and Software, 3(2), 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. X. Li and D. Yeung. Application-level correctness and its impact on fault tolerance. HPCA, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. S. Liu, K. Pattabiraman, T. Moscibroda, and B. Zorn. Flikker: saving dram refresh-power through critical data partitioning. ASPLOS, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. M. Rinard M. Carbin, S. Misailovic. Verifying quantitative reliability of programs that execute on unreliable hardware (appendix). http://groups.csail.mit.edu/pac/rely. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. M. Rinard M. Carbin, S. Misailovic. Verifying quantitative reliability of programs that execute on unreliable hardware. Technical Report MIT-CSAIL-TR-2013-014, MIT, 2013.Google ScholarGoogle Scholar
  37. Xiph.org Video Test Media. http://media.xiph.org/video/derf.Google ScholarGoogle Scholar
  38. J. Meng, A. Raghunathan, S. Chakradhar, and S. Byna. Exploiting the forgiving nature of applications for scalable parallel execution. In IPDPS, 2010.Google ScholarGoogle Scholar
  39. S. Misailovic, D. Kim, and M. Rinard. Parallelizing sequential programs with statistical accuracy tests. ACM TECS Special Issue on Probabilistic Embedded Computing, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. S. Misailovic, D. Roy, and M. Rinard. Probabilistically accurate program transformations. SAS, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. S. Misailovic, S. Sidiroglou, H. Hoffmann, and M. Rinard. Quality of service profiling. ICSE, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. D. Monniaux. Abstract interpretation of probabilistic semantics. SAS, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. C. Morgan, A. McIver, and K. Seidel. Probabilistic predicate transformers. TOPLAS, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. D. Murta and J. N. Oliveira. Calculating fault propagation in functional programs. Technical report, Univ. Minho, 2013.Google ScholarGoogle Scholar
  45. S. Narayanan, J. Sartori, R. Kumar, and D. Jones. Scalable stochastic processors. DATE, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. J. Nelson, A. Sampson, and L. Ceze. Dense approximate storage in phase-change memory. ASPLOS Ideas & Perspectives, 2011.Google ScholarGoogle Scholar
  47. K. Pattabiraman, V. Grover, and B. Zorn. Samurai: protecting critical data in unsafe languages. EuroSys, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. J. Perkins, S. Kim, S. Larsen, S. Amarasinghe, J. Bachrach, M. Carbin, C. Pacheco, F. Sherwood, S. Sidiroglou, G. Sullivan, W. Wong, Y. Zibin, M. Ernst, and M. Rinard. Automatically patching errors in deployed software. SOSP, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. F. Perry, L. Mackey, G.A. Reis, J. Ligatti, D.I. August, and D. Walker. Fault-tolerant typed assembly language. PLDI'07. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. F. Perry and D. Walker. Reasoning about control flow in the presence of transient faults. SAS, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. P. Prata and J. Silva. Algorithm based fault tolerance versus result-checking for matrix computations. FTCS, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. J. Reed and B. Pierce. Distance makes the types grow stronger: a calculus for differential privacy. ICFP, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. G. Reis, J. Chang, N. Vachharajani, R. Rangan, and D. August. Swift: Software implemented fault tolerance. CGO'05. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. M. Rinard. Probabilistic accuracy bounds for fault-tolerant computations that discard tasks. ICS, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. M. Rinard. Using early phase termination to eliminate load imbalances at barrier synchronization points. OOPSLA, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. M. Rinard, C. Cadar, D. Dumitran, D.M. Roy, T. Leu, and W.S. Beebee Jr. Enhancing server availability and security through failure-oblivious computing. OSDI, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. M. Rinard, C. Cadar, and H. Nguyen. Exploring the acceptability envelope. OOPSLA, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. M. Rinard, H. Hoffmann, S. Misailovic, and S. Sidiroglou. Patterns and statistical analysis for understanding reduced resource computing. 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. A. Sampson, W. Dietl, E. Fortuna, D. Gnanapragasam, L. Ceze, and D. Grossman. Enerj: Approximate data types for safe and general low-power computation. PLDI, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. S. Sankaranarayanan, A. Chakarov, and S. Gulwani. Static analysis for probabilistic programs: inferring whole program properties from finitely many paths. In PLDI, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. C. Schlesinger, K. Pattabiraman, N. Swamy, D. Walker, and B. Zorn. Yarra: An extension to c for data integrity and partial safety. CSF, 2011.Google ScholarGoogle Scholar
  62. P. Shivakumar, M. Kistler, S.W. Keckler, D. Burger, and L. Alvisi. Modeling the effect of technology trends on the soft error rate of combinational logic. DSN, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. S. Sidiroglou, S. Misailovic, H. Hoffmann, and M. Rinard. Managing performance vs.\ accuracy trade-offs with loop perforation. FSE, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. M. Smith. Probabilistic abstract interpretation of imperative programs using truncated normal distributions. Electronic\,Notes\,in\,Theoretical\,Computer\,Science, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. W. N. Sumner, T. Bao, X. Zhang, and S. Prabhakar. Coalescing executions for fast uncertainty analysis. In ICSE, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. A. Thomas and K. Pattabiraman. Error detector placement for soft computation. DSN, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. x264. http://www.videolan.org/x264.html.Google ScholarGoogle Scholar
  68. Z. Zhu, S. Misailovic, J. Kelner, and M. Rinard. Randomized accuracy-aware program transformations for efficient approximate computations. POPL, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Verifying quantitative reliability for programs that execute on unreliable hardware

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM SIGPLAN Notices
            ACM SIGPLAN Notices  Volume 48, Issue 10
            OOPSLA '13
            October 2013
            867 pages
            ISSN:0362-1340
            EISSN:1558-1160
            DOI:10.1145/2544173
            Issue’s Table of Contents
            • cover image ACM Conferences
              OOPSLA '13: Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications
              October 2013
              904 pages
              ISBN:9781450323741
              DOI:10.1145/2509136

            Copyright © 2013 Owner/Author

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 29 October 2013

            Check for updates

            Qualifiers

            • research-article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!