Abstract
A computational system employed in safety-critical applications typically has reliability as a primary concern. Thus, the designer focuses on minimizing the device radiation-sensitive area, often leading to performance degradation. In this article, we present a mathematical model to evaluate system reliability in spatial (i.e., radiation-sensitive area) and temporal (i.e., performance) terms and prove that minimizing radiation-sensitive area does not necessarily maximize application reliability. To support our claim, we present an empirical counterexample where application reliability is improved even if the radiation-sensitive area of the device is increased. An extensive radiation test campaign using a 28nm commercial-off-the-shelf ARM-based SoC was conducted, and experimental results demonstrate that, while executing the considered application at military aircraft altitude, the probability of executing a two-year mission workload without failures is increased by 5.85% if L1 caches are enabled (thus increasing the radiation-sensitive area) when compared to no cache level being enabled. However, if both L1 and L2 caches are enabled, the probability is decreased by 31.59%.
- G.-H. Asadi et al. 2005. Balancing performance and reliability in the memory hierarchy. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software 2005 (ISPASS’05). IEEE Computer Society, Washington, DC, 269--279. DOI:http://dx.doi.org/10.1109/ISPASS.2005.1430581 Google Scholar
Digital Library
- G.-H. Asadi et al. 2006. Vulnerability analysis of L2 cache elements to single event upsets. In Proceedings of the Conference on Design, Automation and Test in Europe: Proceedings (DATE’06). European Design and Automation Association, Leuven, Belgium, 1276--1281. Google Scholar
Digital Library
- Sanghyeon Baeg, ShiJie Wen, and R. Wong. 2009. SRAM Interleaving distance selection with a soft error failure model. IEEE Transactions on Nuclear Science 56, 4 (2009), 2111--2118. DOI:http://dx.doi.org/10.1109/TNS.2009.2015312Google Scholar
Cross Ref
- R. Baumann. 2005. Soft errors in advanced computer systems. IEEE Design Test of Computers 22, 3 (2005), 258--266. DOI:http://dx.doi.org/10.1109/MDT.2005.69 Google Scholar
Digital Library
- Luca Benini, Davide Bertozzi, Alessandro Bogliolo, Francesco Menichelli, and Mauro Olivieri. 2005. Mparm: Exploring the multi-processor soc design space with systemc. Journal of VLSI Signal Processing Systems for Signal, Image and Video Technology 41, 2 (2005), 169--182. Google Scholar
Digital Library
- Binkert and others. 2011. The Gem5 simulator. SIGARCH Computer Architecture News 39, 2 (Aug. 2011), 1--7. DOI:http://dx.doi.org/10.1145/2024716.2024718 Google Scholar
Digital Library
- Doug Burger and Todd M. Austin. 1997. The SimpleScalar tool set, version 2.0. SIGARCH Computer Architecture News 25, 3 (June 1997), 13--25. DOI:http://dx.doi.org/10.1145/268806.268810 Google Scholar
Digital Library
- Yuan Cai, M. T. Schmitz, and others. 2006. Cache size selection for performance, energy and reliability of time-constrained systems. In Design Automation, 2006. Asia and South Pacific Conference on Design Automation 2006. 6pp. Google Scholar
Digital Library
- DARPA. 2014. Vulture Program. Retrieved from http://www.darpa.mil/Our_Work/TTO/Programs/Vulture.aspx.Google Scholar
- Digilent. 2014. Zedboard Data Sheet Overview. Retrieved from http://www.xilinx.com/support/documentation/data_sheets/ds190-Zynq-7000-Overview.pdf.Google Scholar
- A. Dixit and Alan Wood. 2011. The impact of new technology on soft error rates. In Proceedings of the 2011 IEEE International Reliability Physics Symposium (IRPS). 5B.4.1--5B.4.7. DOI:http://dx.doi.org/10.1109/IRPS.2011.5784522Google Scholar
Cross Ref
- Gaisler. 2014. Leon Processor. (2014). http://www.gaisler.com.Google Scholar
- E. Ibe et al. 2010. Impact of scaling on neutron-induced soft error in SRAMs from a 250 nm to a 22 nm design rule. IEEE Transactions on Electronic Devices 57, 7 (2010), 1527--1538. DOI:http://dx.doi.org/10.1109/TED.2010.2047907Google Scholar
Cross Ref
- JEDEC. 2006. Measurement and reporting of alpha particle and terrestrial cosmic ray-induced soft errors in semiconductor devices. JESD89A (Oct. 2006).Google Scholar
- JEDEC. 2007. Test method for beam accelerated soft error rate. JESD89-3A (Nov. 2007).Google Scholar
- Austin Lesea and others. 2014. Soft error study of ARM SoC at 28 nanometers. In Proceedings of the IEEE Workshop on Silicon Errors in Logic - System Effects 2014 (SELSE 10).Google Scholar
- P. Liden et al. 1994. On latching probability of particle induced transients in combinational networks. In Digest of Papers on the 24th International Symposium on Fault-Tolerant Computing 1994 (FTCS-24). 340--349. DOI:http://dx.doi.org/10.1109/FTCS.1994.315626Google Scholar
Cross Ref
- Shih-Lien Lu et al. 2012. Scaling the memory wall: Designer track. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD’12). 271--272. DOI:http://dx.doi.org/10.1145/2429384.2429437 Google Scholar
Digital Library
- J. Maiz et al. 2003. Characterization of multi-bit soft error events in advanced SRAMs. In Proceedings of the IEEE International Electron Devices Meeting 2003. (IEDM’03 Technical Digest). 21.4.1--21.4.4. DOI:http://dx.doi.org/10.1109/IEDM.2003.1269335Google Scholar
Cross Ref
- Mehrtash Manoochehri et al. 2011. CPPC: Correctable parity protected cache. In Proceedings of the 38th Annual International Symposium on Computer Architecture (ISCA’11). ACM, New York, 223--234. DOI:http://dx.doi.org/10.1145/2000064.2000091 Google Scholar
Digital Library
- Shubhendu S. Mukherjee, Joel Emer, Tryggve Fossum, and Steven K. Reinhardt. 2004. Cache scrubbing in microprocessors: Myth or necessity? In Proceedings of the 10th IEEE Pacific Rim International Symposium on Dependable Computing (PRDC’04). IEEE Computer Society, Washington, DC, 37--42. Google Scholar
Digital Library
- NASA. 2014. NASA Launches Next Generation PhoneSat. Retrieved from http://www.nasa.gov/content/nasa-launches-next-generation-phonesat-ames-developed-launch-adapter/.Google Scholar
- N. Oh, P. P. Shirvani, and E. J. McCluskey. 2002. Error detection by duplicated instructions in super-scalar processors. IEEE Transactions on Reliability 51, 1 (2002), 63--75. DOI:http://dx.doi.org/10.1109/24.994913Google Scholar
Cross Ref
- David A. Patterson and John L. Hennessy. 2013. Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (5th ed.). Morgan Kaufmann Publishers Inc., San Francisco, CA. Google Scholar
Digital Library
- H. Quinn and P. Graham. 2005. Terrestrial-based radiation upsets: A cautionary tale. In Proceedings of the 13th Annual IEEE Annual Symposium on Field-Programmable Custom Computing Machines 2005 (FCCM 2005). 193--202. DOI:http://dx.doi.org/10.1109/FCCM.2005.61 Google Scholar
Digital Library
- M. Rebaudengo, M. Sonza Reorda, and M. Violante. 2003. An accurate analysis of the effects of soft errors in the instruction and data caches of a pipelined microprocessor. In Proceedings of the Conference on Design, Automation and Test in Europe - Volume 1 (DATE’03). IEEE Computer Society, Washington, DC, 10602. Google Scholar
Digital Library
- P. Rech et al. 2014. Impact of GPUs parallelism management on safety-critical and HPC applications reliability. In Dependable Systems and Networks (DSN) 2014. IEEE. Google Scholar
Digital Library
- R. Vemu, S. Gurumurthy, and J. A. Abraham. 2007. ACCE: Automatic correction of control-flow errors. In Proceedings of the IEEE International Test Conference 2007 (ITC’07). 1--10. DOI:http://dx.doi.org/10.1109/TEST.2007.4437639Google Scholar
Cross Ref
- J. F. Ziegler et al. 1996. IBM experiments in soft fails in computer electronics (1978--1994). IBM Journal of Research Devices 40, 1 (Jan. 1996), 3--18. DOI:http://dx.doi.org/10.1147/rd.401.0003 Google Scholar
Digital Library
Index Terms
Beyond Cross-Section: Spatio-Temporal Reliability Analysis
Recommendations
A reusability-aware cache memory sharing technique for high-performance low-power CMPs with private L2 caches
ISLPED '07: Proceedings of the 2007 international symposium on Low power electronics and designChip multiprocessors (CMPs) emerge as a dominant architectural alternative in high-end embedded systems. Since off-chip accesses require a long latency and consume a large amount of power, CMPs are typically based on multiple levels of on-chip cache ...
Counter-Based Cache Replacement and Bypassing Algorithms
Recent studies have shown that in highly associative caches, the performance gap between the Least Recently Used (LRU) and the theoretical optimal replacement algorithms is large, motivating the design of alternative replacement algorithms to improve ...
A cache design for high performance embedded systems
Cache exploitation in embedded systemsFuture embedded applications will require high performance processors integrating fast and low-power cache. Dynamic Non-Uniform Cache Architectures (D-NUCA) have been proposed to overcome the performance limit introduced by wire delays when designing ...






Comments