- Published on
Planning an AI Teaching Assistant and Moving from Ollama to vLLM to Handle 50 Concurrent Users
I analyzed the performance bottlenecks that appeared when concurrent requests increased on an Ollama-based LLM server, then moved the serving layer to vLLM so it could stably handle around 50 concurrent users.
FeaturedLLMvLLMKubernetesInfra