Articles | Open Access | DOI: https://doi.org/10.37547/tajiir/Volume07Issue07-03

Hadoop To Bigquery: Migrating Automotive Data Lakes Without Downtime

Vrushali Parate , Department of Computer Science and Engineering, University of Bridgeport, Bridgeport, CT, 06604, USA

Abstract

The automotive industry is undergoing a tremendous increase in data generation, mostly driven by advancements in vehicle technology, connectivity, and autonomous driving features. The Apache Hadoop data lake was adopted by companies to store and analyze the huge volume, velocity, and variety of automotive data. However, with technological advancement and the need for real-time analytics, operational complexity, scalability, and cost efficiency, Apache Hadoop-based data lakes started presenting challenges. Google BigQuery, on the other hand, is a fully managed, serverless data warehouse and analytics platform that offers a good alternative with its scalable architecture, high performance, ease of use, and integration with advanced analytics and machine learning services. Migrating this massive amount of automotive data from Hadoop to BigQuery needs careful planning and execution, especially while making sure there are fewer disruptions with the ongoing business and avoiding downtime. This paper explores the typical architecture and use case of Hadoop-based data lakes in the automotive sector, explores BigQuery as an alternative option while also considering its benefits, and analyzes various strategies and methods for a seamless migration. Further, it delves into techniques and best practices for achieving zero downtime during the migration of large automotive datasets, addresses the specific challenges and considerations involved in handling automotive data’s unique characteristics, examines relevant case studies of successful migrations, investigates methods for ensuring data consistency and integrity, and researches approaches to optimize data processing and analytics workflows on BigQuery post-migration.

Keywords

Data Lake, Hadoop, BigQuery, Automotive Industry, Migration

References

Rho Motion. (2025, January 14). Over 17 million EVs sold in 2024 – record year. Rho Motion. https://rhomotion.com/news/over-17-million-evs-sold-in-2024-record-year/

Hai, R., Koutras, C., Quix, C., & Jarke, M. (2023). Data lakes: A survey of functions and systems. IEEE Transactions on Knowledge and Data Engineering, 35(12), 12571-12590.

Singh, B., Verma, H. K., & Madaan, V. (2023). Performance Challenges and Solutions in Big Data Platform Hadoop. Recent Advances in Computer Science and Communications (Formerly: Recent Patents on Computer Science), 16(9), 27-41.

Google Cloud. (n.d.). Introduction to BigQuery. Google.

Belov, V., & Nikulchev, E. (2021). Analysis of big data storage tools for data lakes based on apache hadoop platform. International Journal of Advanced Computer Science and Applications, 12(8).

El Yazidi, A., Azizi, M. S., Benlachmi, Y., & Hasnaoui, M. L. (2021). Apache Hadoop-MapReduce on YARN framework latency. Procedia Computer Science, 184, 803-808.

Ibtisum, S., Bazgir, E., Rahman, S. A., & Hossain, S. S. (2023). A comparative analysis of big data processing paradigms: Mapreduce vs. apache spark. World Journal of Advanced Research and Reviews, 20(1), 1089-1098.

Ma, C., Zhao, M., & Zhao, Y. (2023). An overview of Hadoop applications in transportation big data. Journal of traffic and transportation engineering (English edition), 10(5), 900-917.

Alwaisi, S. S. A., Abbood, M. N., Jalil, L. F., Kasim, S., Fudzee, M. F. M., Hadi, R., & Ismail, M. A. (2021). A review on big data stream processing applications: contributions, benefits, and limitations. JOIV: International Journal on Informatics Visualization, 5(4), 456-460.

Varma, K. M., & Se, G. B. (2022, August). Efficient scalable migrations in the cloud. In 2022 IEEE/ACIS 7th International Conference on Big Data, Cloud Computing, and Data Science (BCD) (pp. 3-6). IEEE.

Kansara, M. A. H. E. S. H. B. H. A. I. (2022). A structured lifecycle approach to large-scale cloud database migration: Challenges and strategies for an optimal transition. Applied Research in Artificial Intelligence and Cloud Computing, 5(1), 237-261.

Hosseini Shirvani, M., Amin, G. R., & Babaeikiadehi, S. (2022). A decision framework for cloud migration: A hybrid approach. IET software, 16(6), 603-629.

Apache Software Foundation. (n.d.). DistCp: Hadoop distributed copy. Hadoop.

Parthi, A. G., Pothineni, B., Jayabalan, D., Banarse, A. R., & Maruthavanan, D. (2024). Efficient Migration of Databases from Teradata to Google BigQuery: A Framework for Modern Data Warehousing. Journal of Software Engineering (JSE), 2(2), 55-64.

Evalueserve. (n.d.). Modernizing financial data infrastructure: On-premises Hadoop migration to Google Cloud.

Rudrabhatla, C. K. (2020, October). Comparison of zero downtime based deployment techniques in public cloud infrastructure. In 2020 Fourth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud)(I-SMAC) (pp. 1082-1086). IEEE.

Cirata. (n.d.). Data Migrator.

Google Cloud. (n.d.). Geotab: Driving innovation with Google Cloud.

Uber Technologies. (2024, July 30). Enabling security for Hadoop data lake on Google Cloud Storage. Uber Blog.

Masolo, C. (2024, October 12). Scaling Uber’s batch data platform: A journey to the cloud with data mesh principles. InfoQ.

Mohammad, N. (2021). Data integrity and cost optimization in cloud migration. International Journal of Information Technology & Management Information System (IJITMIS), 12, 44-56.

Google Cloud. (2025, May 5). Introduction to AI and ML in BigQuery.

Article Statistics

Downloads

Download data is not yet available.

Copyright License

Download Citations

How to Cite

Vrushali Parate. (2025). Hadoop To Bigquery: Migrating Automotive Data Lakes Without Downtime. The American Journal of Interdisciplinary Innovations and Research, 7(07), 16–27. https://doi.org/10.37547/tajiir/Volume07Issue07-03