Pradeep Rao Vennamaneni. “Optimizing Cloud-Native LLM Workloads With Serverless GPU Orchestration and Token-Aware Scheduling”. The American Journal of Engineering and Technology 4, no. 04 (April 25, 2024): 33–59. Accessed October 9, 2025. https://www.theamericanjournals.com/index.php/tajet/article/view/6603.