skip to main content
research-article

Error Detector Placement for Soft Computing Applications

Published:13 January 2016Publication History
Skip Abstract Section

Abstract

The scaling of Silicon devices has exacerbated the unreliability of modern computer systems, and power constraints have necessitated the involvement of software in hardware error detection. At the same time, emerging workloads in the form of soft computing applications (e.g., multimedia applications) can tolerate most hardware errors as long as the erroneous outputs do not deviate significantly from error-free outcomes. We term outcomes that deviate significantly from the error-free outcomes as Egregious Data Corruptions (EDCs).

In this study, we propose a technique to place detectors for selectively detecting EDC-causing errors in an application. We performed an initial study to formulate heuristics that identify EDC-causing data. Based on these heuristics, we developed an algorithm that identifies program locations for placing high coverage detectors for EDCs using static analysis. Our technique achieves an average EDC coverage of 82%, under performance overheads of 10%, while detecting 10% of the Non-EDC and benign faults. We also evaluate the error resilience of these applications under the 14 compiler optimizations.

References

  1. W. Baek and T. M. Chilimbi. 2010. Green: A framework for supporting energy-conscious programming using controlled approximation. In PLDI'10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. C. Bienia, S. Kumar, J. P. Singh, and K. Li. 2008. The PARSEC benchmark suite: Characterization and architectural implications. In PACT'08. 72--81. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Carbin, S. Misailovic, and M. Rinard. 2013. Rely: Verifying quantitative reliability for programs that execute on unreliable hardware. In OOPSLA'13. 33--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Carbin and M. Rinard. 2010. Automatically identifying critical input regions and code in applications. In ISSTA'10. 37--48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. N. P. Carter, H. Naeimi, and D. S. Gardner. 2010. Design techniques for cross-layer resilience. In DATE'10. 1023--1028. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Cong and K. Gururaj. 2011. Assuring application-level correctness against soft errors. In ICCAD'11. 150--157. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. 2001. Introduction to Algorithms. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and F. K. Zadeck. 1991. Efficiently computing static single assignment form and the control dependence graph. TOPLAS 13, 4 (1991), 451--490. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. De Kruijf, S. Nomura, and K. Sankaralingam. 2010. Relax: An architectural framework for software recovery of hardware faults. In ISCA'10. 497--508. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. P. Dubey. 2005. Recognition, mining and synthesis moves computers to the era of tera. Technology@ Intel Magazine (2005), 1--10.Google ScholarGoogle Scholar
  11. J. E. Fritts, F. W. Steiling, and J. A. Tucek. 2005. MediaBench II video: Expediting the next generation of video systems research. SPIE - Embedded Processors for Multimedia and Communications II (2005), 79--93.Google ScholarGoogle Scholar
  12. S. Hari, S. Adve, and H. Naeimi. 2012. Low-cost program-level detectors for reducing silent data corruptions. In DSN'12. 181--188. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Hiller, A. Jhumka, and N. Suri. 2002. On the placement of software mechanisms for detection of data errors. In DSN'02. 135--144. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. Khudia, G. Wright, and S. Mahlke. 2012. Efficient soft error protection for commodity embedded microprocessors using profile information. In LCTES'12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. C. Lattner and V. Adve. 2004. LLVM: A compilation framework for lifelong program analysis & transformation. In CGO'04. 75--86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C. Lee, M. Potkonjak, and W. H. Mangione-Smith. 1997. MediaBench: A tool for evaluating and synthesizing multimedia and communications systems. In MICRO'97. 330--335. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Leeke, S. Arif, A. Jhumka, and S. S. Anand. 2011. A methodology for the generation of efficient error detection mechanisms. In DSN'11. 25--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Leeke and A. Jhumka. 2010. Towards understanding the importance of variables in dependable software. In EDCC'10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. L. Leem, H. Cho, J. Bau, Q. Jacobson, and S. Mitra. 2010. ERSA: Error resilient system architecture for probabilistic applications. In DATE'10. 1560--1565. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. X. Li and D. Yeung. 2007. Application-level correctness and its impact on fault tolerance. In HPCA'07. 181--192. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Liu, K. Pattabiraman, T. Moscibroda, and B. Zorn. 2011. Flikker: Saving DRAM refresh-power through critical data partitioning. In ASPLOS'11. 213--224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. Misailovic, S. Sidiroglou, H. Hoffmann, and M. Rinard. 2010. Quality of service profiling. In ICSE'10. 25--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Narayanan, J. Sartori, R. Kumar, and D. Jones. 2010. Scalable stochastic processor. In DATE'10. 335--338. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. K. Pattabiraman, Z. Kalbarczyk, and R. K. Iyer. 2005. Application-based metrics for strategic placement of detectors. In PRDC'05. 8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. Rehman, M. Shafique, F. Kriebel, and J. Henkel. 2011. Reliable software for unreliable hardware: Embedded code generation aiming at reliability. In CODES+ISSS'11. 237--246. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. Samadi, J. Lee, D. Jamshidi, A. Hormati, and S. Mahlke. 2013. “SAGE”: Self-tuning approximation for graphics engines. In MICRO-46'13. New York, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. A. Sampson, W. Dietl, E. Fortuna, D. Gnanapragasam, L. Ceze, and D. Grossman. 2011. EnerJ: Approximate data types for safe and general low-power computation. In PLDI'11. 164--174. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. P. Shivakumar, M. Kistler, S. W. Keckler, D. Burger, and L. Alvisi. 2002. Modeling the effect of technology trends on the soft error rate of combinational logic. In DSN'02. 389--398. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. D. P. Siewiorek. 1991. Architecture of fault-tolerant computers. Proceedings of IEEE (1991), 79--91.Google ScholarGoogle Scholar
  30. V. Sridharan and D. Kaeli. 2009. Eliminating microarchitectural dependency from architectural vulnerability. In HPCA'09. 117--128.Google ScholarGoogle Scholar
  31. A. Sundaram, A. Aakel, D. Lockhart, D. Thaker, and D. Franklin. 2008. Efficient fault tolerance in multi-media applications through selective instruction replication. In WREFT'08. 339--346. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. A. Thomas and K. Pattabiraman. 2013a. Error detector placement for soft computing applications. In DSN'13. 12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. A. Thomas and K. Pattabiraman. 2013b. LLFI: An intermediate code level fault injector for soft computing applications. In SELSE'13.Google ScholarGoogle Scholar
  34. L. A. Zadeh. 1997. What is soft computing? Soft Computing 1, 1 (1997), 1--1.Google ScholarGoogle Scholar
  35. Y. Zhang, J. Lee, N. Johnson, and D. August. 2010. DAFT: Decoupled acyclic fault tolerance. In PACT'10. 87--98. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Error Detector Placement for Soft Computing Applications

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!