Abstract
Cyber-physical systems (CPS) frequently have to use massive redundancy to meet application requirements for high reliability. While such redundancy is required, it can be activated adaptively, based on the current state of the controlled plant. Most of the time, the plant is in a state that allows for a lower level of fault tolerance. Avoiding the continuous deployment of massive fault tolerance will greatly reduce the workload of the CPS, and lower the operating temperature of the cyber sub-system, thus increasing its reliability. In this article, we extend our prior research by demonstrating a software simulation framework Adaptive Fault Tolerance (AdaFT) that can automatically generate the sub-spaces within which our adaptive fault tolerance can be applied. We also show the theoretical benefits of AdaFT and its actual implementation in several real-world CPSs.
- S. Bak, T. T. Johnson, M. Caccamo, and L. Sha. 2014. Real-time reachability for verified simplex design. In Proceedings of the Real-Time Systems Symposium (RTSS’14). 138--148. Google Scholar
Cross Ref
- C. Bergenheim, S. Shladover, and E. Coelingh. 2012. Overview of platooning systems. Proceedings of the 19th ITS World Congress (2012).Google Scholar
- P. Bogdan and R. Marculescu. 2011. Towards a science of cyber-physical systems design. In Proceedings of the 2011 IEEE/ACM International Conference on Cyber-Physical Systems (ICCPS). IEEE, 99--108. Google Scholar
Digital Library
- L. Buitinck, G. Louppe, M. Blondel, F. Pedregosa, A. Mueller, O. Grisel, V. Niculae, P. Prettenhofer, A. Gramfort, J. Grobler, R. Layton, J. VanderPlas, A. Joly, B. Holt, and G. Varoquaux. 2013. API design for machine learning software: Experiences from the scikit-learn project. In Proceedings of the ECML PKDD Workshop: Languages for Data Mining and Machine Learning. 108--122.Google Scholar
- A. Burns and R. I. Davis. 2015. Mixed criticality systems—A review. In Proceedings of the IEEE Real-Time Systems Symposium (RTSS). Retrieved from www-users.cs.york.ac.uk/burns/review.pdf.Google Scholar
- J. Clausen. 1999. Branch and bound algorithms-principles and examples. Department of Computer Science, University of Copenhagen. 1--30.Google Scholar
- J. Cooling. 2013. Real-time Operating Systems. Lindentree Associates.Google Scholar
- P. N. Currier. 2011. A method for modeling and prediction of ground vehicle dynamics and stability in autonomous systems. PhD Thesis, Virginia Tech University.Google Scholar
- H. Dugoff, P. S. Fancher, and L. Segel. 1969. Tire performance characteristics affecting vehicle response to steering and braking control inputs. Transportation Research Board.Google Scholar
- L. Escobar and W. Meeker. 2006. A review of accelerated test models. Stat. Sci. (2006), 552--577. Google Scholar
Cross Ref
- J. Fraga, F. Siqueira, and F. Favarim. 2003. An adaptive fault-tolerant component model. In Proceedings of theIEEE International Workshop on Object-Oriented Real-Time Dependable Systems. Google Scholar
Cross Ref
- J. Goldberg, I. Greenberg, and T. F. Lawrence. 1993. Adaptive fault tolerance. In Proceedings of the IEEE Workshop on Advances in Parallel and Distributed Systems. Google Scholar
Cross Ref
- A. Goyal and A. N. Tantawi. 1987. Evaluation of performability for degradable computer systems. IEEE Trans. Comput. 36, 6 (1987), 738--744. Google Scholar
Digital Library
- I. Koren and C. M. Krishna. 2007. Fault-tolerant systems. Morgan Kaufmann. Google Scholar
Digital Library
- C. M. Krishna. 2015. Ameliorating thermally acclerated aging with state-based application of fault-tolerance in cyber-physical computers. IEEE Trans. Reliabil. 64, 1 (2015), 4--14. Google Scholar
Cross Ref
- C. M. Krishna and I. Koren. 2013. Adaptive fault-tolerance for cyber-physical systems. In Proceedings of the CPS Workshop. 310--314. Google Scholar
Digital Library
- C. M. Krishna and K. G. Shin. 1987. Performance measures for control computers. IEEE Trans. Automat. Control 32, 6 (1987), 467--473. Google Scholar
Cross Ref
- J. Lehoczky, L. Sha, and Y. Ding. 1989. The rate monotonic scheduling algorithm: Exact characterization and average case behavior. In Proceedings of the Real Time Systems Symposium, 1989. IEEE, 166--171. Google Scholar
Cross Ref
- M. Li, P. Ramachandran, S. Kumar Sahoo, S. V. Adve, V. S. Adve, and Y. Zhou. 2008. Understanding the propagation of hard errors to software and implications for resilient system design. ACM SIGARCH Computer Architecture News 36, 1 (2008), 265--276. Google Scholar
Digital Library
- X. Liu, Q. Wang, S. Gopalakrishnan, W. He, L. Sha, H. Ding, and K. Lee. 2008. ORTEGA: An efficient and flexible online fault tolerance architecture for real-time control systems. IEEE Trans. Industr. Inf. 4, 4 (2008), 213--224. Google Scholar
Cross Ref
- J. F. Meyer. 1982. Closed-form solutions of performability. IEEE Trans. Comput. 31, 7 (1982), 648--657. Google Scholar
Digital Library
- J. F. Meyer, D. G. Furchtgott, and L. T. Wu. 1980. Performability evaluation of the SIFT computer. IEEE Trans. Comput. 29, 6 (1980), 501--509. Google Scholar
Digital Library
- University Michigan. 2012. Matlab and simulink tutorial. Retrieved from http://ctms.engin.umich.edu/CTMS/.Google Scholar
- R. Moazzami, J. C. Lee, and C. Hu. 1989. Temperature acceleration of time-dependent dielectric breakdown. IEEE Transactions on Electron Devices (1989).Google Scholar
- K. P. Murphy. 2013. Machine Learning: A Probabilistic Perspective. MIT Press. Google Scholar
Digital Library
- R. Rajamani. 2011. Vehicle Dynamics and Control. 36, 11 Springer, 2462--2465.Google Scholar
- J. Schoen. 1980. A model of electromigration failure under pulsed condition. J. Appl. Phys. 51, 1 (1980), 508--512. Google Scholar
Cross Ref
- D. K. Schroder. 2007. Negative bias temperature instability: What do we understand? Microelectronics Reliability 47, 6 (2007), 841--852. Google Scholar
Cross Ref
- K. G. Shin, C. M. Krishna, and Y.-H. Lee. 1985. A unified method for evaluating real-time computers and its application. IEEE Trans. Automat. Control (1985).Google Scholar
- Mechanical Simulation. 2015. CarSim. Retrieved from http://www.carsim.com/.Google Scholar
- K. Skadron, M. R. Stan, K. Sankaranarayanan, W. Huang, S. Velusamy, and David Tarjan. 2004. Temperature-aware microarchitecture: Modeling and implementation. ACM Trans. Arch. Code Optimiz. 1, 1 (2004), 94--125. Google Scholar
Digital Library
- J. Song and G. Parmer. 2015. CMON: A predictable monitoring infrastructure for system-level latent fault detection and recovery. In Proceedings of the Real-Time and Embedded Technology and Application Symposium (RTAS). 247--258. Google Scholar
Cross Ref
- J. Song, J. Wittrock, and G. Parme. 2013. Predictable, efficient system-level fault tolerance in C∧ 3. Real-Time Systems Symposium (RTSS) (2013), 21--32. Google Scholar
Digital Library
- A. Swetha, R. Pillay V, and S. Punnekkat. 2014. Design, analysis and implementation of improved adaptive fault tolerant model for cruise control multiprocessor system. Int. J. Comput. Appl. 86, 15 (2014).Google Scholar
- S. Thrun, W. Burgard, and D. Fox. 2005. Probabilistic Robotics. MIT Press.Google Scholar
- S. Vestal. 2007. Preemptive scheduling of multi-criticality systems with varying degrees of execution time assurance. In Proceedings of the IEEE Real-Time Systems Symposium (RTSS). 239--243. Google Scholar
Digital Library
- W. J. Vigrass. 2004. Calculation of semiconductor failure rates. Harris Semiconductor (2004).Google Scholar
- B. Wittenmark. 2011. Computer-Controlled Systems: Theory and Design. Courier Dover Publications.Google Scholar
Index Terms
AdaFT: A Framework for Adaptive Fault Tolerance for Cyber-Physical Systems
Recommendations
Sampling + DMR: practical and low-overhead permanent fault detection
ISCA '11With technology scaling, manufacture-time and in-field permanent faults are becoming a fundamental problem. Multi-core architectures with spares can tolerate them by detecting and isolating faulty cores, but the required fault detection coverage becomes ...
A self-stabilizing link-coloring protocol resilient to unbounded byzantine faults in arbitrary networks
OPODIS'05: Proceedings of the 9th international conference on Principles of Distributed SystemsSelf-stabilizing protocols can tolerate any type and any number of transient faults. However, in general, self-stabilizing protocols provide no guarantee about their behavior against permanent faults. This paper proposes a self-stabilizing link-coloring ...
Sampling + DMR: practical and low-overhead permanent fault detection
ISCA '11: Proceedings of the 38th annual international symposium on Computer architectureWith technology scaling, manufacture-time and in-field permanent faults are becoming a fundamental problem. Multi-core architectures with spares can tolerate them by detecting and isolating faulty cores, but the required fault detection coverage becomes ...






Comments