Pradeep Rao Vennamaneni (2024) “Optimizing Cloud-Native LLM Workloads with Serverless GPU Orchestration and Token-Aware Scheduling”, The American Journal of Engineering and Technology, 4(04), pp. 33–59. Available at: https://www.theamericanjournals.com/index.php/tajet/article/view/6603 (Accessed: 26 January 2026).