Abstract
Researchers have recently designed a number of application-specific fault tolerance mechanisms that enable applications to either be naturally resilient to errors or include additional detection and correction steps that can bring the overall execution of an application back into an envelope for which an acceptable execution is eventually guaranteed. A major challenge to building an application that leverages these mechanisms, however, is to verify that the implementation satisfies the basic invariants that these mechanisms require---given a model of how faults may manifest during the application's execution.
To this end we present Leto, an SMT-based automatic verification system that enables developers to verify their applications with respect to an execution model specification. Namely, Leto enables software and platform developers to programmatically specify the execution semantics of the underlying hardware system as well as verify assertions about the behavior of the application's resulting execution. In this paper, we present the Leto programming language and its corresponding verification system. We also demonstrate Leto on several applications that leverage application-specific fault tolerance
Supplemental Material
- Alaa R. Alameldeen, Ilya Wagner, Zeshan Chishti, Wei Wu, Chris Wilkerson, and Shih-Lien Lu. 2011. Energy-efficient Cache Design Using Variable-strength Error-correcting Codes (ISCA). Google Scholar
Digital Library
- Saman Amarasinghe, Dan Campbell, William Carlson, Andrew Chien, William Dally, Elmootazbellah Elnohazy, Robert Harrison, William Harrod, Jon Hiller, Sherman Karp, Charles Koelbel, David Koester, Peter Kogge, John Levesque, Daniel Reed, Robert Schreiber, Mark Richards, Al Scarpelli, John Shalf, Allan Snavely, and Thomas Sterling. 2009. ExaScale Software Study: Software Challenges in Extreme Scale Systems.Google Scholar
- JEDEC Solid State Technology Association et al. 2012. JEDEC Standard: DDR4 SDRAM. JESD79-4, Sep (2012).Google Scholar
- Todd M Austin. 1999. DIVA: A reliable substrate for deep submicron microarchitecture design (MICRO). Google Scholar
Digital Library
- Zelalem Birhanu Aweke, Salessawi Ferede Yitbarek, Rui Qiao, Reetuparna Das, Matthew Hicks, Yossi Oren, and Todd Austin. 2016. ANVIL: Software-based protection against next-generation rowhammer attacks (ASPLOS).Google Scholar
- Michael Barnett, Bor-Yuh Evan Chang, Robert DeLine, Bart Jacobs, and K Rustan M Leino. 2005. Boogie: A modular reusable verifier for object-oriented programs (FMCO). Google Scholar
Digital Library
- Mike Barnett, K Rustan M Leino, and Wolfram Schulte. 2004. The Spec# programming system: An overview (CASSIS). Google Scholar
Digital Library
- G. Barthe, J. Crespo, and C. Kunz. 2011. Relational verification using product programs (FM). Google Scholar
Digital Library
- N. Benton. 2004. Simple relational correctness proofs for static analyses and program transformations (POPL). Google Scholar
Digital Library
- S. Borkar. 2005. Designing reliable systems from unreliable components: the challenges of transistor variability and degradation. IEEE Micro 25, 6 (2005). Google Scholar
Digital Library
- Brett Boston, Zoe Gong, and Michael Carbin. 2018. Verifying Programs Under Custom Application-Specific Execution Models (arXiv 1805.06090).Google Scholar
- Brett Boston, Adrian Sampson, Dan Grossman, and Luis Ceze. 2015. Probability type inference for flexible approximate programming (OOPSLA). Google Scholar
Digital Library
- Keith A Bowman, James W Tschanz, Nam Sung Kim, Janice C Lee, Chris B Wilkerson, Shih-Lien L Lu, Tanay Karnik, and Vivek K De. 2009. Energy-efficient and metastability-immune resilient circuits for dynamic variation tolerance. IEEE Journal of Solid-State Circuits 44, 1 (2009), 49–63.Google Scholar
Cross Ref
- Keith A Bowman, James W Tschanz, Shih-Lien L Lu, Paolo A Aseron, Muhammad M Khellah, Arijit Raychowdhury, Bibiche M Geuskens, Carlos Tokunaga, Chris B Wilkerson, Tanay Karnik, and Vivek K De. 2011. A 45 nm resilient microprocessor core for dynamic variation tolerance. IEEE Journal of Solid-State Circuits 46, 1 (2011), 194–208.Google Scholar
Cross Ref
- Greg Bronevetsky and Bronis de Supinski. 2008. Soft error vulnerability of iterative linear algebra methods (ICS). Google Scholar
Digital Library
- S Buchner, M Baze, D Brown, D McMorrow, and J Melinger. 1997. Comparison of error rates in combinational and sequential logic. IEEE transactions on Nuclear Science 44, 6 (1997), 2209–2216.Google Scholar
- M. Carbin, D. Kim, S. Misailovic, and M. Rinard. 2012. Proving Acceptability Properties of Relaxed Nondeterministic Approximate Programs (PLDI). Google Scholar
Digital Library
- M. Carbin, D. Kim, S. Misailovic, and M. Rinard. 2013a. Verified integrity properties for safe approximate program transformations (PEPM). Google Scholar
Digital Library
- M. Carbin, S. Misailovic, and M. Rinard. 2013b. Verifying Quantitative Reliability for Programs That Execute on Unreliable Hardware (OOPSLA). Google Scholar
Digital Library
- Michael Carbin and Martin C. Rinard. 2010. Automatically Identifying Critical Input Regions and Code in Applications (ISSTA). Google Scholar
Digital Library
- Swarat Chaudhuri, Sumit Gulwani, and Roberto Lublinerman. 2010. Continuity Analysis of Programs (POPL). Google Scholar
Digital Library
- Swarat Chaudhuri, Sumit Gulwani, Roberto Lublinerman, and Sara Navidpour. 2011. Proving Programs Robust (ESEC/FSE). Google Scholar
Digital Library
- Daniel Chen, Gabriela Jacques-Silva, Zbigniew Kalbarczyk, Ravishankar K Iyer, and Bruce Mealey. 2008. Error behavior comparison of multiple computing systems: A case study using Linux on Pentium, Solaris on SPARC, and AIX on POWER (PRDC). Google Scholar
Digital Library
- Liang Chen and Mehdi B Tahoori. 2012. An efficient probability framework for error propagation and correlation estimation (IOLTS). Google Scholar
Digital Library
- Leonardo De Moura and Nikolaj Bjørner. 2008. Z3: An efficient SMT solver (TACAS).Google Scholar
- Peng Du, Aurelien Bouteiller, George Bosilca, Thomas Herault, and Jack Dongarra. 2012. Algorithm-based Fault Tolerance for Dense Matrix Factorizations (PPoPP). Google Scholar
Digital Library
- Yong hun Eom and Brian Demsky. 2012. Self-stabilizing Java (PLDI). Google Scholar
Digital Library
- Cormac Flanagan and K Rustan M Leino. 2001. Houdini, an annotation assistant for ESC/Java (FME). Google Scholar
Digital Library
- Carlo Alberto Furia and Bertrand Meyer. 2010. Fields of Logic and Computation. Springer-Verlag, Chapter Inferring Loop Invariants Using Postconditions, 277–300.Google Scholar
- Shaobo He, Shuvendu K Lahiri, and Zvonimir Rakamarić. 2016. Verifying relative safety, accuracy, and termination for program approximations (NFM). Google Scholar
Digital Library
- Shaobo He, Shuvendu K. Lahiri, and Zvonimir Rakamarić. 2018. Verifying Relative Safety, Accuracy, and Termination for Program Approximations. Journal of Automated Reasoning 60, 1 (2018). Google Scholar
Digital Library
- C. A. R. Hoare. 1969. An Axiomatic Basis for Computer Programming. Commun. ACM 12, 10 (Oct. 1969), 576–580. Google Scholar
Digital Library
- Mark Hoemmen and Michael A Heroux. 2011. Fault-tolerant iterative methods via selective reliability (SC).Google Scholar
- H. Hoffman, S. Sidiroglou, M. Carbin, S. Misailovic, A. Agarwal, and M. Rinard. 2011. Dynamic Knobs for Responsive PowerAware Computing (ASPLOS). Google Scholar
Digital Library
- Kuang-Hua Huang and Abraham. 1984. Algorithm-based fault tolerance for matrix operations. IEEE transactions on computers 100, 6. Google Scholar
Digital Library
- Tomoo Inoue, Hayato Henmi, Yuki Yoshikawa, and Hideyuki Ichihara. 2011. High-level synthesis for multi-cycle transient fault tolerant datapaths (IOLTS). Google Scholar
Digital Library
- C. G. J. Jacobi. 1845. Ueber eine neue AuflÃűsungsart der bei der Methode der kleinsten Quadrate vorkommenden lineÃďren Gleichungen. Astronomische Nachrichten 22, 20 (1845), 297–306.Google Scholar
Cross Ref
- Allan H Johnston. 2000. Scaling and technology issues for soft error rates. (2000).Google Scholar
- Lee Hsiao-Heng Kelin, Lilja Klas, Bounasser Mounaim, Relangi Prasanthi, Ivan R Linscott, Umran S Inan, and Mitra Subhasish. 2010. LEAP: Layout design through error-aware transistor positioning for soft-error resilient sequential cell design (IRPS).Google Scholar
- Dae-Hyun Kim, Prashant J Nair, and Moinuddin K Qureshi. 2015. Architectural support for mitigating row hammering in DRAM memories. IEEE Computer Architecture Letters 14, 1 (2015), 9–12.Google Scholar
Cross Ref
- Jangwoo Kim, Nikos Hardavellas, Ken Mai, Babak Falsafi, and James Hoe. 2007. Multi-bit Error Tolerant Caches Using Two-Dimensional Error Coding (MICRO). Google Scholar
Digital Library
- Yoongu Kim, Ross Daly, Jeremie Kim, Chris Fallin, Ji Hye Lee, Donghyuk Lee, Chris Wilkerson, Konrad Lai, and Onur Mutlu. 2014. Flipping bits in memory without accessing them: An experimental study of DRAM disturbance errors (ISCA). Google Scholar
Digital Library
- Nasser A Kurd, Subramani Bhamidipati, Christopher Mozak, Jeffrey L Miller, Timothy M Wilson, Mahadev Nemani, and Muntaquim Chowdhury. 2010. Westmere: A family of 32nm IA processors (ISSCC).Google Scholar
- Shuvendu K. Lahiri, Chris Hawblitzel, Ming Kawaguchi, and Henrique Rebêlo. 2012. SYMDIFF: A Language-agnostic Semantic Diff Tool for Imperative Programs (CAV). Google Scholar
Digital Library
- Mark Lanteigne. 2016. How Rowhammer Could Be Used to Exploit Weaknesses in Computer Hardware.Google Scholar
- Tuo Li, Jude Angelo Ambrose, Roshan Ragel, and Sri Parameswaran. 2016. Processor Design for Soft Errors: Challenges and State of the Art. ACM Computing Surveys (CSUR) 49, 3 (2016), 57. Google Scholar
Digital Library
- K Lilja, M Bounasser, S-J Wen, R Wong, J Holst, N Gaspard, S Jagannathan, D Loveless, and B Bhuva. 2013. Single-event performance and layout optimization of flip-flops in a 28-nm bulk technology. IEEE Transactions on Nuclear Science 60, 4 (2013), 2782–2788.Google Scholar
Cross Ref
- David J. Lu. 1982. Watchdog processors and structural integrity checking. IEEE Trans. Comput. 31, 7 (1982), 681–685. Google Scholar
Digital Library
- Albert Meixner, Michael E Bauer, and Daniel Sorin. 2007. Argus: Low-cost, comprehensive error detection in simple cores (MICRO). Google Scholar
Digital Library
- Matthew L. Meola and David Walker. 2010. Faulty Logic: Reasoning About Fault Tolerant Programs (ESOP). Google Scholar
Digital Library
- Bertrand Meyer. 1992. Eiffel: The Language. Prentice-Hall, Inc., Upper Saddle River, NJ, USA. Google Scholar
Digital Library
- Sasa Misailovic, Michael Carbin, Sara Achour, Zichao Qi, and Martin C Rinard. 2014. Chisel: reliability-and accuracy-aware optimization of approximate computational kernels (OOPSLA). Google Scholar
Digital Library
- S. Misailovic, D. Roy, and M. Rinard. 2011. Probabilistically Accurate Program Transformations (SAS). Google Scholar
Digital Library
- S. Misailovic, S. Sidiroglou, H. Hoffmann, and M. Rinard. 2010. Quality of service profiling (ICSE). Google Scholar
Digital Library
- Subhasish Mitra, Norbert Seifert, Ming Zhang, Quan Shi, and Kee Sup Kim. 2005. Robust system design with built-in soft-error resilience. Computer 38, 2 (2005), 43–52. Google Scholar
Digital Library
- Subhasish Mitra, Ming Zhang, Saad Waqas, Norbert Seifert, Balkaran Gill, and Kee Sup Kim. 2006. Combinational logic soft error correction (ESOP).Google Scholar
- Shubhendu S Mukherjee, Christopher Weaver, Joel Emer, Steven K Reinhardt, and Todd Austin. 2003. A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor (MICRO). Google Scholar
Digital Library
- Fabian Oboril, Mehdi B Tahoori, Vincent Heuveline, Dimitar Lukarski, and Jan-Philipp Weiss. 2011. Numerical defect correction as an algorithm-based fault tolerance technique for iterative solvers (PRDC). Google Scholar
Digital Library
- Martin Omana, Giacinto Papasso, Daniele Rossi, and Cecilia Metra. 2003. A model for transient fault propagation in combinatorial logic (IOLTS).Google Scholar
- Jongse Park, Hadi Esmaeilzadeh, Xin Zhang, Mayur Naik, and William Harris. 2015. FlexJava: Language Support for Safe and Modular Approximate Programming (FSE). Google Scholar
Digital Library
- RC Quinn, JS Kauppila, TD Loveless, JA Maharrey, JD Rowe, ML Alles, BL Bhuva, RA Reed, M Mounasser, K Lilja, and LW Massengill. 2015a. Frequency Trends Observed in 32nm SOI Flip-Flops and Combinational Logic. IEEE Transactions on Nuclear Science (2015).Google Scholar
- RC Quinn, JS Kauppila, TD Loveless, JA Maharrey, JD Rowe, MW McCurdy, EX Zhang, ML Alles, BL Bhuva, RA Reed, WT Holman, M Bounasser, K Lilja, and LW Massengill. 2015b. Heavy ion SEU test data for 32nm SOI flip-flops (REDW).Google Scholar
- R Rajaraman, JS Kim, Narayanan Vijaykrishnan, Yuan Xie, and Mary Jane Irwin. 2006. SEAT-LA: A soft error analysis tool for combinational logic (VLSI Design). Google Scholar
Digital Library
- Rajeev R Rao, Kaviraj Chopra, David T Blaauw, and Dennis M Sylvester. 2007. Computing the soft error rate of a combinational logic circuit using parameterized descriptors. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 26, 3 (2007), 468–479. Google Scholar
Digital Library
- G. Reis, J. Chang, N. Vachharajani, R. Rangan, and D. August. 2005. SWIFT: Software Implemented Fault Tolerance (CGO). Google Scholar
Digital Library
- M. Rinard. 2006. Probabilistic accuracy bounds for fault-tolerant computations that discard tasks (ICS). Google Scholar
Digital Library
- Amber Roy-Chowdhury and Prithviraj Banerjee. 1994. Algorithm-based fault location and recovery for matrix computations (FTCS).Google Scholar
- Amber Roy-Chowdhury and Prithviraj Banerjee. 1996. Algorithm-based fault location and recovery for matrix computations on multiprocessor systems. IEEE transactions on computers 45, 11 (1996), 1239–1247. Google Scholar
Digital Library
- Adrian Sampson, Werner Dietl, Emily Fortuna, Danushen Gnanapragasam, Luis Ceze, and Dan Grossman. 2011. EnerJ: Approximate data types for safe and general low-power computation (PLDI). Google Scholar
Digital Library
- Thiago Santini, Christoph Borchert, Christian Dietrich, Horst Schirmeier, Martin Hoffmann, Olaf Spinczyk, Daniel Lohmann, Flávio Rech Wagner, and Paolo Rech. 2017. Effectiveness of Software-Based Hardening for Radiation-Induced Soft Errors in Real-Time Operating Systems (ARCS).Google Scholar
- Piyush Sao, Oded Green, Chirag Jain, and Richard Vuduc. 2016. A Self-Correcting Connected Components Algorithm (FTXS). Google Scholar
Digital Library
- Piyush Sao and Richard Vuduc. 2013. Self-stabilizing iterative solvers (ScalA). Google Scholar
Digital Library
- Manu Shantharam, Sowmyalatha Srinivasmurthy, and Padma Raghavan. 2012. Fault tolerant preconditioned conjugate gradient for sparse linear system solution (ICS). Google Scholar
Digital Library
- Premkishore Shivakumar, Michael Kistler, Stephen W Keckler, Doug Burger, and Lorenzo Alvisi. 2002. Modeling the effect of technology trends on the soft error rate of combinational logic (DSN). Google Scholar
Digital Library
- Marc Snir, Robert W Wisniewski, Jacob A Abraham, Sarita V Adve, Saurabh Bagchi, Pavan Balaji, Jim Belak, Pradip Bose, Franck Cappello, Bill Carlson, et al. 2014. Addressing failures in exascale computing. The International Journal of High Performance Computing Applications 28, 2 (2014), 129–173. Google Scholar
Digital Library
- M. Sousa and I. Dillig. 2016. Cartesian Hoare Logic for Verifying K-safety Properties (PLDI). Google Scholar
Digital Library
- Michael B Sullivan and Earl E Swartzlander. 2012. Truncated error correction for flexible approximate multiplication (ASILOMAR).Google Scholar
- Michael B Sullivan and Earl E Swartzlander. 2013. Truncated logarithmic approximation (ARITH). Google Scholar
Digital Library
- Anna Thomas and Karthik Pattabiraman. 2016. Error Detector Placement for Soft Computing Applications. ACM Trans. Embed. Comput. Syst. (2016). Google Scholar
Digital Library
- M Turowski, K Lilja, K Rodbell, and P Oldiges. 2015. 32nm SOI SRAM and latch SEU crosssections measured (heavy ion data) and determined with simulations (SEE).Google Scholar
- R. Venkatagiri, A. Mahmoud, S. K. S. Hari, and S. V. Adve. 2016. Approxilyzer: Towards a systematic framework for instruction-level approximate computing and its application to hardware resiliency (MICRO). Google Scholar
Digital Library
- Sriram Krishnamoorthy Vishal Chandra Sharma, Ganesh Gopalakrishnan. 2016. Towards Resiliency Evaluation of Vector Programs (DPDNS).Google Scholar
- Feng Wang and Yuan Xie. 2011. Soft error rate analysis for combinational logic using an accurate electrical masking model. IEEE Transactions on Dependable and Secure Computing 8, 1 (2011), 137–146. Google Scholar
Digital Library
- Jiesheng Wei and Karthik Pattabiraman. 2012. BLOCKWATCH: Leveraging similarity in parallel programs for error detection (DSN). Google Scholar
Digital Library
- Keun Soo Yim. 2014. Characterization of impact of transient faults and detection of data corruption errors in large-scale n-body programs using graphics processing units (IPDPS).Google Scholar
- Keun Soo Yim, Zbigniew Kalbarczyk, and Ravishankar K Iyer. 2010. Measurement-based analysis of fault and error sensitivities of dynamic memory (DSN).Google Scholar
- Keun Soo Yim, Cuong Pham, Mushfiq Saleheen, Zbigniew Kalbarczyk, and Ravishankar Iyer. 2011. Hauberk: Lightweight silent data corruption error detector for gpgpu (IPDPS).Google Scholar
- Doe Hyun Yoon and Mattan Erez. 2009. Memory Mapped ECC: Low-cost Error Protection for Last Level Caches (ISCA).Google Scholar
- Ming Zhang and Naresh R Shanbhag. 2006. Soft-error-rate-analysis (SERA) methodology. IEEE Transactions on ComputerAided Design of Integrated Circuits and Systems 25, 10 (2006), 2140–2155. Google Scholar
Digital Library
Index Terms
Leto: verifying application-specific hardware fault tolerance with programmable execution models
Recommendations
Application-Aware Byzantine Fault Tolerance
DASC '14: Proceedings of the 2014 IEEE 12th International Conference on Dependable, Autonomic and Secure ComputingByzantine fault tolerance has been intensively studied over the past decade as a way to enhance the intrusion resilience of computer systems. However, state-machine-based Byzantine fault tolerance algorithms require deterministic application processing ...
Combining fault injection and model checking to verify fault tolerance in multi-agent systems
AAMAS '09: Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 1The ability to guarantee that a system will continue to operate correctly under degraded conditions is key to the success of adopting multi-agent systems (MAS) as a paradigm for designing complex agent based fault tolerant systems. In order to provide ...
Computer Aided Design of Fault-Tolerant Application Specific Programmable Processors
Application Specific Programmable Processors (ASPP) provide efficient implementation for any of $m$ specified functionalities. Due to their flexibility and convenient performance-cost trade-offs, ASPPs are being developed by DSP, video, multimedia, and ...






Comments