Articles
| Open Access | Holistic Resilience in Modern Embedded Architectures: Soft-Error Mitigation, Traceability, and Fault-Tolerant Design
Ananya R. Mehra , Department of Computer Engineering, Westbridge Institute of TechnologyAbstract
Modern safety-critical domains — including automotive zonal controllers, aerospace avionics, and industrial control systems — demand embedded computing platforms that deliver high performance while guaranteeing reliability under transient faults and malicious disturbances. Existing literature highlights techniques spanning hardware redundancy, software assertions, trace-based observability, and controller-level mitigation, yet integrating these approaches into cohesive, deployable architectures remains challenging (Entrena et al., 2012; Arifeen et al., 2020).
Objective: This paper synthesizes established and emerging strategies to propose a unified conceptual architecture for resilient heterogeneous embedded systems. The work aims to reconcile the competing objectives of performance, observability, and safety certification by combining selective software-only detection, hardware soft-error controllers, and advanced trace/monitoring infrastructures.
Methods: We perform an in-depth theoretical synthesis of published methods — including selective software assertions (Chielle et al., 2015), soft-error mitigation controllers (Xilinx Inc., 2014), CoreSight trace architectures (ARM Ltd., 2009; 2011), and multilevel emulation-based fault injection (Entrena et al., 2012). We develop a narrative method that maps failure modes to countermeasures and elaborates design patterns for co-design, emphasizing component-level contracts, traceability, and staged mitigation.
Results: The proposed architecture layers lightweight software assertions with hardware error-detection and correction at memory and interconnect levels, integrates a traceable CoreSight-style program-flow telemetry fabric, and places a configurable soft-error mitigation controller at critical fault domains. Analytical reasoning demonstrates that this composition provides graceful degradation, improved diagnosability, and a path to satisfy stringent safety standards while bounding performance overheads.
Conclusions: A co-design strategy that explicitly couples observability (trace), selective detection (assertions), and hardware mitigation (SEM) yields a pragmatic path toward certifiable, high-performance embedded platforms. Future work should validate the architecture through targeted fault-injection experiments and quantify trade-offs in representative automotive and aerospace workloads.
Keywords
fault tolerance, soft errors, hardware-software co-design, CoreSight trace
References
CoreSight Program Flow Trace. Architecture Specification, ARM Ltd., IHI 0035B, 2011.
Soft Error Mitigation Controller v4.1. Product Guide, Xilinx Inc., PG036, Nov. 2014.
Chielle, E., et al. S-seta: Selective software-only error-detection technique using assertions. IEEE Transactions on Nuclear Science, vol. 62, no. 6, pp. 3088–3095, Dec. 2015.
Dodd, P. E., et al. Neutron-induced latchup in SRAMs at ground level. In 2003 IEEE International Reliability Physics Symposium Proceedings, 41st Annual, 2003, pp. 51–55.
Domeika, M. Software Development for Embedded Multi-core Systems. A Practical Guide Using Embedded Intel Architecture. Elsevier Inc., 2008. ISBN 978-0-7506-8539-9.
Dubrova, E. Fault Tolerant Design: An Introduction. 2008. Available from: http://www.pld.ttu.ee/IAF0530/draft.pdf. [Accessed September 2017].
Entrena, L., et al. Soft error sensitivity evaluation of microprocessors by multilevel emulation-based fault injection. IEEE Transactions on Computers, vol. 61, no. 3, pp. 313–322, March 2012.
ESA. ESA/SCC Basic specification n. 25100: Single Event Effects Test Method and Guidelines. Noordwijk, Netherlands, 2005.
Alcaide Portet, S., 2023. Hardware/Software solutions to enable the use of high-performance processors in the most stringent safety-critical systems.
Abdul Salam Abdul Karim. Fault-Tolerant Dual-Core Lockstep Architecture for Automotive Zonal Controllers Using NXP S32G Processors. International Journal of Intelligent Systems and Applications in Engineering, 11(11s), 877–885, 2023. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/7749
Arifeen, T., Hassan, A. S., & Lee, J. A. Approximate triple modular redundancy: A survey. IEEE Access, vol. 8, pp. 139851–139867, 2020.
Arthur, D., Becker, C., Epstein, A., Uhl, B., & Ranville, S. Foundations of automotive software (No. DOT HS 813 226). United States Department of Transportation, National Highway Traffic Safety Administration, 2022.
Beckers, A., Guilley, S., Maurine, P., O'Flynn, C., & Picek, S. (Adversarial) electromagnetic disturbance in the industry. IEEE Transactions on Computers, vol. 72, no. 2, pp. 414–422, 2022.
Chamorro, W., Sola, J., & Andrade-Cetto, J. Event-based line SLAM in real-time. IEEE Robotics and Automation Letters, vol. 7, no. 3, pp. 8146–8153, 2022.
Download and View Statistics
Copyright License
Copyright (c) 2023 Ananya R. Mehra

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors retain the copyright of their manuscripts, and all Open Access articles are disseminated under the terms of the Creative Commons Attribution License 4.0 (CC-BY), which licenses unrestricted use, distribution, and reproduction in any medium, provided that the original work is appropriately cited. The use of general descriptive names, trade names, trademarks, and so forth in this publication, even if not specifically identified, does not imply that these names are not protected by the relevant laws and regulations.

