Home Blog Activities & Projects

vLLM

2 posts

Published on
May 6, 2026
Planning an AI Teaching Assistant and Moving from Ollama to vLLM to Handle 50 Concurrent Users
I analyzed the performance bottlenecks that appeared when concurrent requests increased on an Ollama-based LLM server, then moved the serving layer to vLLM so it could stably handle around 50 concurrent users.
FeaturedLLMvLLMKubernetesInfra
Published on
May 2, 2026
Running a vLLM GPU Workload on k3s
Setting up the NVIDIA driver, runtime, k3s containerd, RuntimeClass, and device plugin needed to run the Code Place AI assistant on an operations cluster.
FeaturedKubernetesk3svLLMCUDAInfra