The Evolution of Data Architectures: Leveraging Lakehouse Systems with Apache Iceberg for Privacy-Preserving Machine Learning Pipelines
Shivaprasad Sankesha Narayana , Senior Architect, SAIPSIT Inc, Houston Texas, United StatesAbstract
This paper looks at data Lakehouse architectures as a game changer in enterprise data infrastructure, focusing on Apache Iceberg storage. We cover the full capabilities of these systems for data throughout its life cycle – from ingest to visualization—and how machine learning can be used to enhance that. We also look at execution frameworks based on directed acyclic graphs and the privacy implications of those workflows. Our results show this integrated approach is better for operational efficiency, analytical flexibility, and compliance than traditional, siloed architectures.
Keywords
Data Lakehouse, Apache Iceberg, Machine Learning Augmentation, DAG Execution, Privacy Engineering.
References
Zaharia, M., et al. (2021). Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics. CIDR 2021. https://www.cidrdb.org/cidr2021/papers/cidr2021_paper17.pdf
Apache Iceberg. (n.d.). Apache Iceberg™: Overview. Retrieved April 13, 2025, from https://iceberg.apache.org/
Armbrust, M., et al. (2020). Delta Lake: High-Performance ACID Table Storage over Cloud Object Stores. Proceedings of the VLDB Endowment, 13(12), 3411-3424. https://dl.acm.org/doi/10.14778/3415478.3415560
Apache Iceberg. (n.d.). Introduction - Apache Iceberg™ Documentation. Retrieved April 13, 2025, from https://iceberg.apache.org/docs/latest/
Apache Iceberg. (n.d.). Iceberg Table Specification - Apache Iceberg™ Documentation. Retrieved April 13, 2025, from https://iceberg.apache.org/spec/
Ryan Blue. (2022). "Apache Iceberg: Format for Huge Analytic Tables." The Apache Software Foundation. https://iceberg.apache.org/
El Mestari, S.Z., et al. (2023). Preserving data privacy in machine learning systems. ScienceDirect. https://www.sciencedirect.com/science/article/pii/S0167404823005151
Rao, P.R.M., et al. (2018). Privacy preservation techniques in big data analytics: a survey. Journal of Big Data. https://journalofbigdata.springeropen.com/articles/10.1186/s40537-018-0141-8
IEEE (2021). IEEE 2842-2021 IEEE Recommended Practice for Secure Multi-Party Computation.
Download and View Statistics
Copyright License
Copyright (c) 2025 Shivaprasad Sankesha Narayana

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors retain the copyright of their manuscripts, and all Open Access articles are disseminated under the terms of the Creative Commons Attribution License 4.0 (CC-BY), which licenses unrestricted use, distribution, and reproduction in any medium, provided that the original work is appropriately cited. The use of general descriptive names, trade names, trademarks, and so forth in this publication, even if not specifically identified, does not imply that these names are not protected by the relevant laws and regulations.


Engineering and Technology
| Open Access |
DOI: