PRADEEP RAO VENNAMANENI. Optimizing Cloud-Native LLM Workloads with Serverless GPU Orchestration and Token-Aware Scheduling. The American Journal of Engineering and Technology, [S. l.], v. 4, n. 04, p. 33–59, 2024. Disponível em: https://www.theamericanjournals.com/index.php/tajet/article/view/6603. Acesso em: 26 jan. 2026.