Loading video player...
In this tutorial, we deploy DeepSeek-V4-Pro - a 1.6 trillion parameter open-weight MoE model - across two 8x NVIDIA H100-80G-PCIe-NVLink nodes on Hyperstack Kubernetes using vLLM's hybrid Data + Expert Parallelism topology. What's covered: - Why V4-Pro requires multi-node deployment (960 GB checkpoint, 640 GB single-node VRAM ceiling) - Standing up a 2-worker Kubernetes cluster on Hyperstack with the NVIDIA GPU Operator pre-installed - Installing the LeaderWorkerSet controller and coordinating leader/worker pods across nodes - Pre-downloading 960 GB of weights to per-node NVMe using a parallel Kubernetes Job - Deploying V4-Pro with hybrid DEP, MTP speculative decoding, and FP8 KV cache - Switching between Non-think, Think High, and Think Max reasoning modes - Running a long-horizon autonomous refactoring agent with tool calling - Connecting to Claude Code, OpenClaw, and OpenCode as a local backend DeepSeek-V4-Pro scores 80.6 on SWE-Bench Verified and 93.5 on LiveCodeBench v6 at 49B active parameters per token. Full step-by-step tutorial: https://bit.ly/4eDJK4V #DeepSeek #KubernetesAI #vLLM #Hyperstack #GPUCloud #AgenticAI #OpenSource #NVIDIA #MultiNodeLLM #LLMDeployment