Applied Sciences
| Open Access | Comprehensive Fault-Tolerant Computing Architectures for Safety-Critical Embedded and Multi-Core Systems
Dr. Arun K. Mehra , Institute of Dependable Embedded Systems, Prakash UniversityAbstract
This article synthesizes contemporary approaches to fault tolerance in embedded and safety-critical processors, proposes an integrative conceptual framework that unifies soft-error mitigation, lock-step replication, hybrid error-detection architectures, and redundancy-aware scheduling, and evaluates trade-offs that shape practical deployment in harsh and regulated environments. Methods: Building strictly on the provided references, the study performs an exhaustive theoretical integration of published methods—error detection and mitigation at the microarchitectural, core, and system levels—translating empirical insights into a cohesive methodology for designing resilient multi-core and lock-step systems without relying on new experimental data. Results: The synthesis demonstrates how transient-fault recovery techniques (including simultaneous multithreading and core replication) can be combined with trace-interface-based detection, PTM-informed hybrid detectors, and embedded debug features to produce scalable resilience with optimized cost, power, and latency. It also characterizes the contexts—radiation-prone aerospace, automotive zonal controllers, and industrial electronics—where each approach yields maximal benefit. Conclusions: Integrating low-cost hardware redundancy with software-aware detection and recovery strategies yields the best balance between safety integrity, resource overhead, and system performance. The paper identifies precise design patterns, potential pitfalls, and research directions that arise when adopting Triple Core Lock-Step and hybrid PTM detection schemes on modern ARM-class processors targeted for ASIL D and ultra-reliable applications.
Keywords
artificial intelligenc, data analytics, product management, program management, product-program manager
References
F. Abate, L. Sterpone, and M. Violante, A new mitigation approach for soft errors in embedded processors, IEEE Transactions on Nuclear Science, vol. 55, no. 4, pp. 2063–2069, Aug. 2008.
M. Violante, C. Meinhardt, R. Reis, and M. S. Reorda, A low-cost solution for deploying processor cores in harsh environments, IEEE Transactions on Industrial Electronics, vol. 58, no. 7, pp. 2617–2626, Jul. 2011.
BERNON-ENJALBERT, Valerie, et al. Safety Integrated Hardware Solutions to Support ASIL D Applications. 2013.
X. Iturbe, B. Venu, E. Ozer and S. Das, "A Triple Core Lock-Step (TCLS) ARM® Cortex®-R5 Processor for Safety-Critical and Ultra-Reliable Applications," 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshop (DSN-W), Toulouse, 2016, pp. 246-249.
M. Portela-García et al., On the use of embedded debug features for permanent and transient fault resilience in microprocessors. Microprocessors and Microsystems, 36(5), pp. 334-343, 2012.
L. Entrena, A. Lindoso, M. Portela-García, L. Parra, B. Du, M. Sonza Reorda, L. Sterpone, Fault-tolerance techniques for soft-core processors using the Trace Interface, In “FPGAs and Parallel Architectures for Aerospace Applications. Soft Errors and Fault-Tolerant Design”, Springer, 2015.
M. Peña-Fernandez, A. Lindoso, L. Entrena, M. Garcia-Valderas, S. Philippe, Y. Morilla, P. Martin-Holgado. PTM-based hybrid error-detection architecture for ARM microprocessors. Microelectronics Reliability, 88, pp. 925-930, 2018.
T. N. Vijaykumar, I. Pomeranz, and K. Cheng, “Transient-fault recovery using simultaneous multithreading,” in Proc. 29th Annu. Int. Symp. Comput. Archit., Anchorage, AK, USA, 2002, pp. 38–87.
M. Gomaa, C. Scarbrough, T. N. Vijaykumar and I. Pomeranz, "Transient-fault recovery for chip multiprocessors," 30th Annual International Symposium on Computer Architecture, 2003. Proceedings., San Diego, CA, USA, 2003, pp. 98-109.
Abdul Salam Abdul Karim. (2023). Fault-Tolerant Dual-Core Lockstep Architecture for Automotive Zonal Controllers Using NXP S32G Processors. International Journal of Intelligent Systems and Applications in Engineering, 11(11s), 877–885. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/7749
K. Chen, G. v. der Bruggen, and J. Chen, “Reliability optimization on multi-core systems with multi-tasking and redundant multi-threading,” IEEE Transactions on Computers, vol. 67, no. 4, pp. 484–497, April 2018.
Article Statistics
Downloads
Copyright License
Copyright (c) 2023 Dr. Arun K. Mehra

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors retain the copyright of their manuscripts, and all Open Access articles are disseminated under the terms of the Creative Commons Attribution License 4.0 (CC-BY), which licenses unrestricted use, distribution, and reproduction in any medium, provided that the original work is appropriately cited. The use of general descriptive names, trade names, trademarks, and so forth in this publication, even if not specifically identified, does not imply that these names are not protected by the relevant laws and regulations.

