Engineering and Technology | Open Access | DOI: https://doi.org/10.37547/tajet/Volume07Issue11-10

The Evolution of Data Architectures: Leveraging Lakehouse Systems with Apache Iceberg for Privacy-Preserving Machine Learning Pipelines

Shivaprasad Sankesha Narayana , Senior Architect, SAIPSIT Inc, Houston Texas, United States

Abstract

This paper looks at data Lakehouse architectures as a game changer in enterprise data infrastructure, focusing on Apache Iceberg storage. We cover the full capabilities of these systems for data throughout its life cycle – from ingest to visualization—and how machine learning can be used to enhance that. We also look at execution frameworks based on directed acyclic graphs and the privacy implications of those workflows. Our results show this integrated approach is better for operational efficiency, analytical flexibility, and compliance than traditional, siloed architectures.

Keywords

Data Lakehouse, Apache Iceberg, Machine Learning Augmentation, DAG Execution, Privacy Engineering.

References

Zaharia, M., et al. (2021). Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics. CIDR 2021. https://www.cidrdb.org/cidr2021/papers/cidr2021_paper17.pdf

Apache Iceberg. (n.d.). Apache Iceberg™: Overview. Retrieved April 13, 2025, from https://iceberg.apache.org/

Armbrust, M., et al. (2020). Delta Lake: High-Performance ACID Table Storage over Cloud Object Stores. Proceedings of the VLDB Endowment, 13(12), 3411-3424. https://dl.acm.org/doi/10.14778/3415478.3415560

Apache Iceberg. (n.d.). Introduction - Apache Iceberg™ Documentation. Retrieved April 13, 2025, from https://iceberg.apache.org/docs/latest/

Apache Iceberg. (n.d.). Iceberg Table Specification - Apache Iceberg™ Documentation. Retrieved April 13, 2025, from https://iceberg.apache.org/spec/

Ryan Blue. (2022). "Apache Iceberg: Format for Huge Analytic Tables." The Apache Software Foundation. https://iceberg.apache.org/

El Mestari, S.Z., et al. (2023). Preserving data privacy in machine learning systems. ScienceDirect. https://www.sciencedirect.com/science/article/pii/S0167404823005151

Rao, P.R.M., et al. (2018). Privacy preservation techniques in big data analytics: a survey. Journal of Big Data. https://journalofbigdata.springeropen.com/articles/10.1186/s40537-018-0141-8

IEEE (2021). IEEE 2842-2021 IEEE Recommended Practice for Secure Multi-Party Computation.

Download and View Statistics

Views: 0   |   Downloads: 0

Copyright License

Download Citations

How to Cite

Shivaprasad Sankesha Narayana. (2025). The Evolution of Data Architectures: Leveraging Lakehouse Systems with Apache Iceberg for Privacy-Preserving Machine Learning Pipelines. The American Journal of Engineering and Technology, 7(11), 85–94. https://doi.org/10.37547/tajet/Volume07Issue11-10