Loading video player...
Can you run LLMs on Kubernetes from a mini PC under your desk? Yes β and here's the full stack in 90 seconds. π§ Custom active cooling for the NVIDIA Tesla T4 (N3rdware bracket + 5V fan swap) π₯οΈ GPU passthrough on Proxmox β including the 4 kernel parameters most people miss on the MS-02 βΈοΈ Kubernetes cluster provisioned with Cluster API (GPU + CPU worker pools) β‘ NVIDIA GPU Operator: drivers, device plugin, DCGM monitoring β single Helm install π GPU time-slicing: 1 physical Tesla T4 β 4 schedulable GPU slots in Kubernetes π€ LLM inference with Ollama β Llama 3.1 8B + DeepSeek-R1 7B π Gateway API Inference Extension: KV-cache-aware routing, A/B testing, load shedding π GitHub repo (all configs + manifests): https://github.com/isItObservable/K8s-LLM #Kubernetes #GPU #LLM #Proxmox #Homelab #NVIDIA #CloudNative #IsItObservable