Engineering Trust in AI Systems: A Data-Layer Framework for Explainability and Auditability
Mohammed Arbaaz Shareef , Lead Data Engineer at AnblicksAbstract
The article examines engineering approaches for strengthening trust in AI systems through data-layer controls that make decisions explainable and verifiable in audits. The widening regulatory and organizational demands for the traceability of training data, the reproducibility of pipelines, and the defensible documentation of model behavior in production drive practical relevance. Scientific novelty lies in integrating provenance capture, lineage graphs, feature-store governance, and standardized documentation artifacts into one coherent data-layer framework that produces machine-checkable evidence. The work describes lifecycle evidence generation from ingestion to inference, studies how provenance models and lineage datasets support inspection, and analyzes how documentation instruments complement technical traces. Special attention is given to preventing evidence gaps caused by opaque preprocessing, weak versioning, and incomplete logging. The study aims to systematize a data-layer architecture that supports explainability and auditability without relying on new model classes. A comparative analysis of recent research, a synthesis of published frameworks, and a structured review of sources are employed in this study. The conclusion summarizes actionable controls and their expected audit outputs. The article targets engineers, MLOps teams, risk functions, and internal/external auditors.
Keywords
trustworthy AI, explainability, auditability, data provenance, data lineage, feature store, model documentation, traceability, metadata governance, reproducible pipelines.
References
Ahmed, M., Dar, A. R., Helfert, M., Khan, A., & Kim, J. (2023). Data provenance in healthcare: Approaches, challenges, and future directions. Sensors, 23(14), 6495. https://doi.org/10.3390/s23146495
Chen, Y., Zhao, Y., Li, X., Zhang, J., Long, J., & Zhou, F. (2024). An open dataset of data lineage graphs for data governance research. Visual Informatics, 8(1), 1–5. https://doi.org/10.1016/j.visinf.2024.01.001
de la Rúa Martínez, J., Buso, F., Kouzoupis, A., Ormenisan, A. A., Niazi, S., Bzhalava, D., Mak, K., Jouffrey, V., Ronström, M., Cunningham, R., Zangis, R., Mukhedkar, D., Khazanchi, A., Vlassov, V., & Dowling, J. (2024). The Hopsworks feature store for machine learning. In Companion of the 2024 International Conference on Management of Data (SIGMOD ’24) (pp. 135–147). Association for Computing Machinery. https://doi.org/10.1145/3626246.3653389
Gilbert, S., Adler, R., Holoyad, T., & Weicken, E. (2025). Could transparent model cards with layered accessible information drive trust and safety in health AI? npj Digital Medicine, 8(1), 124. https://doi.org/10.1038/s41746-025-01482-9
Kalokyri, V., Tachos, N. S., Kalantzopoulos, C. N., Sfakianakis, S., Kondylakis, H., Zaridis, D. I., Colantonio, S., Regge, D., Papanikolaou, N., Marias, K., Fotiadis, D. I., Tsiknakis, M., & (2025). AI model passport: Data and system traceability framework for transparent AI in health. Computational and Structural Biotechnology Journal, 28, 386–404. https://doi.org/10.1016/j.csbj.2025.09.041
Liu, R., Park, K., Psallidas, F., Zhu, X., Mo, J., Sen, R., Interlandi, M., Karanasos, K., Tian, Y., & Camacho-Rodríguez, J. (2023). Optimizing data pipelines for machine learning in feature stores. Proceedings of the VLDB Endowment, 16(13), 4230–4239. https://doi.org/10.14778/3625054.3625060
Longpre, S., Mahari, R., Chen, A., et al. (2024). A large-scale audit of dataset licensing and attribution in AI. Nature Machine Intelligence, 6, 975–987. https://doi.org/10.1038/s42256-024-00878-8
Mökander, J., Schuett, J., Kirk, H. R., et al. (2024). Auditing large language models: A three-layered approach. AI Ethics, 4, 1085–1115. https://doi.org/10.1007/s43681-023-00289-2
Schlegel, M., & Sattler, K.-U. (2025). Capturing end-to-end provenance for machine learning pipelines. Information Systems, 132, 102495. https://doi.org/10.1016/j.is.2024.102495
Staufer, L., Yang, M., Reuel, A., & Casper, S. (2025). Audit cards: Contextualizing AI evaluations (arXiv:2504.13839). arXiv. https://arxiv.org/abs/2504.13839
Download and View Statistics
Copyright License
Copyright (c) 2026 Mohammed Arbaaz Shareef

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors retain the copyright of their manuscripts, and all Open Access articles are disseminated under the terms of the Creative Commons Attribution License 4.0 (CC-BY), which licenses unrestricted use, distribution, and reproduction in any medium, provided that the original work is appropriately cited. The use of general descriptive names, trade names, trademarks, and so forth in this publication, even if not specifically identified, does not imply that these names are not protected by the relevant laws and regulations.


Articles
| Open Access |
DOI: