Engineering and Technology | Open Access |

Adaptive Resilience: Integrating Ansible-Based Dynamic Scaling and Formal Chaos Engineering for AI-Enabled Microservices in Hybrid Cloud Environments

Elena V. Rostova , Independent Researcher, Cloud Systems & Reliability Engineering, Zurich, Switzerland
Marcus J. Thorne , Institute for Computational Resilience, Boston, MA, USA

Abstract

Background: The proliferation of AI-enabled microservices in enterprise environments has necessitated robust strategies for dynamic scaling. While Platform-as-a-Service (PaaS) offerings provide inherent scalability, they often suffer from "cold-start" latency and unpredictable cost implications during varying workloads, such as refinery turnarounds or large-scale data processing events.

Methods: This study introduces a Resilient Scaling Orchestrator (RSO) that integrates Ansible-based automation with formal process algebraic models to optimize end-to-end dynamic scaling. We employ a hybrid methodology that combines theoretical formal component modeling to predict system states with practical chaos engineering experiments to validate resilience. The approach leverages Ansible playbooks to pre-warm instances based on predictive heuristics, mitigating cold-start latency.

Results: Experimental validation using industry-standard microservices benchmarks demonstrates that the proposed RSO reduces cold-start latency by approximately 40% compared to reactive Azure PaaS autoscaling. Furthermore, the integration of formal verification ensures that 99.9% of scaling operations maintain transactional integrity even under induced chaos scenarios.

Conclusion: The findings suggest that combining infrastructure-as-code tools with formal mathematical modeling provides a superior framework for managing the cost-performance trade-off in cloud-native AI applications.

 

Keywords

Cloud Computing, Dynamic Scaling, Microservices,, Ansible

References

Sai Nikhil Donthi. (2025). Ansible-Based End-To-End Dynamic Scaling on Azure Paas for Refinery Turnarounds: Cold-Start Latency and Cost–Performance Trade-Offs. Frontiers in Emerging Computer Science and Information Technology, 2(11),01–17. https://doi.org/10.64917/fecsit/Volume02Issue11-01

S. Henning, "Scalability Benchmarking of Cloud-Native Applications Applied to EventDriven Microservices," Doctoral Dissertation, University of Kiel, 2023. Available: https://oceanrep.geomar.de/id/eprint/58268/1/Dissertation_Soeren_Henning.pdf

S. Eeti, P. Kumar, and R. Singh, "Scalability And Performance Optimization In Distributed Systems: Exploring Techniques To Enhance The Scalability And Performance Of Distributed Computing Systems," International Journal of Creative Research Thoughts, vol. 11, no. 5, pp. 234-249, May 2023. Available: https://www.ijcrt.org/papers/IJCRT23A5530.pdf

Shantanu Kumar et al., "Resource Management in AI-Enabled Cloud Native Databases: A Systematic Literature Review Study," ResearchGate Technical Report, pp. 1-42, 2024. Available: https://www.researchgate.net/publication/381480037_Resource_Management_in_AIEnabled_Cloud_Native_Databases_A_Systematic_Literature_Review_Study

L. Tucci, "What is enterprise AI? A complete guide for businesses," TechTarget Enterprise AI Guide, Oct. 2024. Available: https://www.techtarget.com/searchenterpriseai/Ultimate-guide-to-artificial-intelligence-in-theenterprise

L. Bottou, F. E. Curtis, and J. Nocedal, "Optimization Methods for Large-Scale Machine Learning," SIAM Review, vol. 60, no. 2, pp. 223-311, 2018. Available: https://epubs.siam.org/doi/abs/10.1137/16M1080173?journalCode=siread

M. Bravetti and G. Zavattaro. On the expressive power of process interruption and compensation. Mathematical Structures in Computer Science, 19(3):565–599, 2009.

N. J. Casey Rosenthal. Chaos Engineering. O’Reilly Media, Inc., 1 edition, 2020.

R. D. Cosmo, S. Zacchiroli, and G. Zavattaro. Towards a formal component model for the cloud. In G. Eleftherakis, M. Hinchey, and M. Holcombe, editors, Software Engineering and Formal Methods - 10th International Conference, SEFM 2012, Thessaloniki, Greece, October 1-5, 2012. Proceedings, volume 7504 of Lecture Notes in Computer Science, pages 156–171. Springer, 2012.

S. de Gouw, J. Mauro, and G. Zavattaro. On the modeling of optimal and automatized cloud application deployment. Journal of Logical and Algebraic Methods in Programming, 107:108 – 135, 2019.

Docker. Docker compose documentation. https://docs.docker.com/compose/.

Docker. Docker swarm. https://docs.docker.com/engine/swarm/.

N. Dragoni, S. Giallorenzo, A. Lluch-Lafuente, M. Mazzara, F. Montesi, R. Mustafin, and L. Safina. Microservices: Yesterday, today, and tomorrow. In M. Mazzara and B. Meyer, editors, Present and Ulterior Software Engineering, pages 195–216. Springer, 2017.

K. Fromm. Thinking Serverless! How New Approaches Address Modern Data Processing Needs. https://read.acloud.guru/thinking-serverless-how-new-approaches-addressmodern-data-processing-needs-part-1-af6a158a3af1.

Y. Gan, Y. Zhang, D. Cheng, A. Shetty, P. Rathi, N. Katarki, A. Bruno, J. Hu, B. Ritchken, B. Jackson, K. Hu, M. Pancholi, Y. He, B. Clancy, C. Colen, F. Wen, C. Leung, S. Wang, L. Zaruvinsky, M. Espinosa, R. Lin, Z. Liu, J. Padilla, and C. Delimitrou. An open-source benchmark suite for microservices and their hardware-software implications for cloud & edge systems. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’19, page 3–18, New York, NY, USA, 2019. Association for Computing Machiner

Article Statistics

Copyright License

Download Citations

How to Cite

Elena V. Rostova, & Marcus J. Thorne. (2025). Adaptive Resilience: Integrating Ansible-Based Dynamic Scaling and Formal Chaos Engineering for AI-Enabled Microservices in Hybrid Cloud Environments. The American Journal of Engineering and Technology, 7(11), 117–123. Retrieved from https://www.theamericanjournals.com/index.php/tajet/article/view/6955