skip to main content
10.1145/1250734.1250741acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
Article

Fault-tolerant typed assembly language

Published:10 June 2007Publication History

ABSTRACT

A transient hardware fault occurs when an energetic particle strikes a transistor, causing it to change state. Although transient faults do not permanently damage the hardware, they may corrupt computations by altering stored values and signal transfers. In this paper, we propose a new scheme for provably safe and reliable computing in the presence of transient hardware faults. In our scheme, software computations are replicated to provide redundancy while special instructions compare the independently computed results to detect errors before writing critical data. In stark contrast to any previous efforts in this area, we have analyzed our fault tolerance scheme from a formal, theoretical perspective. To be specific, first, we provide an operational semantics for our assembly language, which includes a precise formal definition of our fault model. Second, we develop an assembly-level type system designed to detect reliability problems in compiled code. Third, we provide a formal specification for program fault tolerance under the given fault model and prove that all well-typed programs are indeed fault tolerant. In addition to the formal analysis, we evaluate our detection scheme and show that it only takes 34% longer to execute than the unreliable version.

References

  1. R. C. Baumann. Soft errors in advanced semiconductor devices-part I: the three radiation sources. IEEE Transactions on Device and Materials Reliability, 1(1):17--22, March 2001.Google ScholarGoogle ScholarCross RefCross Ref
  2. R. C. Baumann. Soft errors in commercial semiconductor technology: Overview and scaling trends. In IEEE 2002 Reliability Physics Tutorial Notes, Reliability Fundamentals, pages 121 01.1--121 01.14, April 2002.Google ScholarGoogle Scholar
  3. S. Borkar. Designing reliable systems from unreliable components: the challenges of transistor variability and degradation. In IEEE Micro, volume 25, pages 10--16, December 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Gomaa, C. Scarbrough, T. N. Vijaykumar, and I. Pomeranz. Transient-fault recovery for chip multiprocessors. In Proceedings of the 30th annual international symposium on Computer architecture, pages 98--109. ACM Press, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. W. Horst, R. L. Harris, and R. L. Jardine. Multiple instruction issue in the NonStop Cyclone processor. In Proceedings of the 17th International Symposium on Computer Architecture, pages 216--226, May 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Mahmood and E. J. McCluskey. Concurrent error detection using watchdog processors-a survey. IEEE Transactions on Computers, 37(2):160--174, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. E. Michalak, K. W. Harris, N. W. Hengartner, B. E. Takala, and S. A. Wender. Predicting the number of fatal soft errors in Los Alamos National Labratory's ASC Q computer. IEEE Transactions on Device and Materials Reliability, 5(3):329--335, September 2005.Google ScholarGoogle ScholarCross RefCross Ref
  8. G. Morrisett, D. Walker, K. Crary, and N. Glew. From System F to Typed Assembly Language. ACM Transactions on Programming Languages and Systems, 3(21):528--569, May 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. S. Mukherjee, M. Kontz, and S. K. Reinhardt. Detailed design and evaluation of redundant multithreading alternatives. In Proceedings of the 29th Annual International Symposium on Computer Architecture, pages 99--110. IEEE Computer Society, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. G. C. Necula. Compiling with Proofs. PhD thesis, Carnegie Mellon University, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. T. J. O'Gorman, J. M. Ross, A. H. Taber, J. F. Ziegler, H. P. Muhlfeld, I. C. J. Montrose, H. W. Curtis, and J. L. Walsh. Field testing for cosmic ray soft errors in semiconductor memories. In IBM Journal of Research and Development, pages 41--49, January 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. N. Oh, P. P. Shirvani, and E. J. McCluskey. Control-flow checking by software signatures. In IEEE Transactions on Reliability, volume 51, pages 111--122, March 2002.Google ScholarGoogle ScholarCross RefCross Ref
  13. N. Oh, P. P. Shirvani, and E. J. McCluskey. Error detection by duplicated instructions in super-scalar processors. In IEEE Transactions on Reliability, volume 51, pages 63--75, March 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Ohlsson and M. Rimen. Implicit signature checking. In International Conference on Fault-Tolerant Computing, June 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. F. Perry, L.Mackey, G. A. Reis, J. Ligatti, D. I. August, and D.Walker. Fault-tolerant typed assembly language. Technical Report TR--776--07, Princeton University, 2007.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. K. Reinhardt and S. S. Mukherjee. Transient fault detection via simultaneous multithreading. In Proceedings of the 27th Annual International Symposium on Computer Architecture, pages 25--36. ACM Press, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. G. A. Reis, J. Chang, and D. I. August. Automatic instruction-level software-only recovery methods. In IEEE Micro Top Picks, volume 27, January 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. G. A. Reis, J. Chang, N. Vachharajani, R. Rangan, and D. I. August. SWIFT: Software implemented fault tolerance. In Proceedings of the 3rd International Symposium on Code Generation and Optimization, March 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. G. A. Reis, J. Chang, N. Vachharajani, R. Rangan, D. I. August, and S. S. Mukherjee. Design and evaluation of hybrid fault--detection systems. In Proceedings of the 32th Annual International Symposium on Computer Architecture, pages 148--159, June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. P. P. Shirvani, N. Saxena, and E. J. McCluskey. Softwareimplemented EDAC protection against SEUs. In IEEE Transactions on Reliability, volume 49, pages 273--284, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  21. P. Shivakumar, M. Kistler, S. W. Keckler, D. Burger, and L. Alvisi. Modeling the effect of technology trends on the soft error rate of combinational logic. In Proceedings of the 2002 International Conference on Dependable Systems and Networks, pages 389--399, June 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. T. J. Slegel, R. M. Averill III, M. A. Check, B. C. Giamei, B. W. Krumm, C. A. Krygowski, W. H. Li, J. S. Liptay, J. D. MacDougall, T. J. McPherson, J. A. Navarro, E. M. Schwarz, K. Shum, and C. F. Webb. IBM's S/390 G5 Microprocessor design. In IEEE Micro, volume 19, pages 12--23, March 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Triantafyllis, M. J. Bridges, E. Raman, G. Ottoni, and D. I. August. A framework for unrestricted whole--program optimization. In ACM SIGPLAN 2006 Conference on Programming Language Design and Implementation, pages 61--71, June 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. R. Venkatasubramanian, J. P. Hayes, and B. T. Murray. Low-cost on-line fault detection using control flow assertions. In Proceedings of the 9th IEEE International On-Line Testing Symposium, pages 137--143, July 2003.Google ScholarGoogle ScholarCross RefCross Ref
  25. T. N. Vijaykumar, I. Pomeranz, and K. Cheng. Transient-fault recovery using simultaneous multithreading. In Proceedings of the 29th Annual International Symposium on Computer Architecture, pages 87--98. IEEE Computer Society, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. D. Walker, L. Mackey, J. Ligatti, G. Reis, and D. I. August. Static typing for a faulty lambda calculus. In ACMInternational Conference on Functional Programming, Portland, Oregon, Sept. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Y. Yeh. Triple-triple redundant 777 primary flight computer. In Proceedings of the 1996 IEEE Aerospace Applications Conference, volume 1, pages 293--307, February 1996.Google ScholarGoogle ScholarCross RefCross Ref
  28. J. F. Ziegler and H. Puchner. SER-History, Trends, and Challenges: A Guide for Designing with Memory ICs. 2004.Google ScholarGoogle Scholar

Index Terms

  1. Fault-tolerant typed assembly language

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            PLDI '07: Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation
            June 2007
            508 pages
            ISBN:9781595936332
            DOI:10.1145/1250734
            • cover image ACM SIGPLAN Notices
              ACM SIGPLAN Notices  Volume 42, Issue 6
              Proceedings of the 2007 PLDI conference
              June 2007
              491 pages
              ISSN:0362-1340
              EISSN:1558-1160
              DOI:10.1145/1273442
              Issue’s Table of Contents

            Copyright © 2007 ACM

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 10 June 2007

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • Article

            Acceptance Rates

            Overall Acceptance Rate406of2,067submissions,20%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!