This repository demonstrates how to use vCluster with NVIDIA KAI Scheduler to achieve isolated, team-specific GPU scheduling on a single Kubernetes cluster. With vCluster’s virtual scheduler feature, different teams can run different KAI scheduler versions simultaneously without interfering with each other.
MANDATORY: Before proceeding with this demo, you must set up a GPU-enabled Kubernetes cluster on GKE by following this blog post:
📖 How to Set Up a GPU‑Enabled Kubernetes Cluster on GKE by Hrittik Roy
This blog provides step-by-step instructions for:
Do not proceed until you have completed the GKE setup and verified that GPUs are accessible in your cluster.
KAI (Kubernetes AI) is an advanced Kubernetes scheduler designed for GPU workload optimization. Key features include:
| Feature | Benefit |
|---|---|
| Fractional GPU allocation | Share a single GPU between multiple workloads (e.g., 0.5 training, 0.25 inference, 0.25 dev) |
| Queue-based scheduling | Hierarchical resource management with fair sharing |
| Topology awareness | Optimize scheduling decisions based on hardware layout |
| Fair sharing | Prevent resource monopolization across teams |
KAI was open-sourced in 2025, bringing enterprise-grade GPU management to the Kubernetes community.
vCluster creates isolated Kubernetes clusters that run inside a namespace of a host Kubernetes cluster. Think of it as “Kubernetes in Kubernetes” - each vCluster has its own control plane (API server, scheduler, syncer) but leverages the host cluster’s compute resources.
With the virtual scheduler feature enabled, vCluster can run custom schedulers (like KAI) completely isolated within each virtual cluster. This enables:
This demo showcases:
Understanding typical GPU workloads helps contextualize KAI’s fractional allocation:
| Workload Type | Examples | Typical GPU Usage |
|---|---|---|
| Model Training | Fine-tuning LLMs, Deep Learning | 100% for hours/days |
| Stable Diffusion | Image generation | ~50% GPU |
| LLM Inference | ChatGPT API, Claude API | 25-75% depending on model |
| Video Processing | Transcoding, streaming | Variable 20-80% |
| CUDA Development | Jupyter notebooks, testing | Often < 20% |
| Batch Processing | Scientific computing | Spikes to 100% |
git clone <repository-url>
cd vcluster-kai-demo
After completing the GKE setup, verify that your cluster has GPU access:
kubectl config current-context | sed 's/^/CURRENT_CONTEXT: /'
# Test GPU accessibility
kubectl run gpu-verify --image=nvidia/cuda:12.2.0-base-ubuntu20.04 \
--rm -it --restart=Never \
--overrides='{"spec":{"runtimeClassName":"nvidia","nodeSelector":{"nvidia.com/gpu.present":"true"}}}' \
-- nvidia-smi -L
You should see output listing available GPUs (e.g., “GPU 0: NVIDIA Tesla T4”).
Install the NVIDIA device plugin on your GKE cluster:
# Install NVIDIA device plugin
kubectl apply -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.17.4/deployments/static/nvidia-device-plugin.yml
# Wait for device plugin to be ready
kubectl wait --for=condition=ready pod -n kube-system \
-l name=nvidia-device-plugin-ds --timeout=300s
Create a RuntimeClass that will be synced by vCluster:
kubectl apply -f - <<EOF
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
name: nvidia
handler: nvidia
EOF
Verify the RuntimeClass was created:
kubectl get runtimeclass nvidia
If you haven’t already installed the vCluster CLI, follow the official installation guide.
This repository includes pre-configured manifests in the manifests/ directory:
manifests/kai-vcluster.yaml)This file configures vCluster with the virtual scheduler enabled:
experimental:
syncSettings:
setOwner: false # Required for KAI pod-grouper
controlPlane:
advanced:
virtualScheduler:
enabled: true # Runs scheduler inside vCluster
sync:
fromHost:
nodes:
enabled: true # Syncs host nodes for label detection
runtimeClasses:
enabled: true # Syncs NVIDIA runtime
# Auto-enabled with virtual scheduler:
csiDrivers:
enabled: auto
csiNodes:
enabled: auto
csiStorageCapacities:
enabled: auto
Key configuration highlights:
virtualScheduler.enabled: true - Runs scheduler inside vClustersyncSettings.setOwner: false - Required for KAI pod-grouperVirtual Scheduler Benefits:
manifests/queues.yaml)Defines two queues for hierarchical resource management:
default - Parent queue with unlimited quotastest - Child queue inheriting from defaultQueues enable fair sharing and prevent resource monopolization.
manifests/gpu-demo-pod1.yaml and manifests/gpu-demo-pod2.yaml)Two sample pods demonstrating fractional GPU allocation:
Both pods use:
schedulerName: kai-scheduler - Use KAI for schedulingruntimeClassName: nvidia - Use NVIDIA container runtimenodeSelector - Target GPU-enabled nodes# Create vCluster with virtual scheduler enabled
vcluster create kai-isolated --values manifests/kai-vcluster.yaml
This command:
kai-isolatedkubectl config current-context | sed 's/^/CURRENT_CONTEXT: /'
# Connect to vCluster first
vcluster connect kai-isolated
# Install KAI - it will be THE scheduler for this vCluster
KAI_VERSION=v0.7.11
helm upgrade -i kai-scheduler \
oci://ghcr.io/nvidia/kai-scheduler/kai-scheduler \
-n kai-scheduler --create-namespace \
--version $KAI_VERSION \
--set "global.gpuSharing=true"
kubectl wait --for=condition=ready pod -n kai-scheduler --all --timeout=120s
# Apply queues and deploy two pods with different GPU fractions
kubectl apply -f manifests/queues.yaml
kubectl apply -f manifests/gpu-demo-pod1.yaml
kubectl apply -f manifests/gpu-demo-pod2.yaml
kubectl wait --for=condition=ready pod -n default --timeout=120s
# Show both pods sharing the GPU
kubectl get pods -l app=gpu-demo -o custom-columns=NAME:.metadata.name,FRACTION:.metadata.annotations."kai\.scheduler/gpu-fraction",STATUS:.status.phase
Both pods should report seeing the same GPU, demonstrating fractional GPU sharing.
This scenario demonstrates how different teams can run different KAI scheduler versions simultaneously.
kubectl config current-context | sed 's/^/CURRENT_CONTEXT: /'
# Disconnect from vCluster
vcluster disconnect
# Delete the entire vCluster (timed)
time vcluster delete kai-isolated --delete-namespace
kubectl config current-context | sed 's/^/CURRENT_CONTEXT: /'
# Create multiple vClusters for different teams using existing config
# Team 1: Stable version
vcluster create team-stable --values manifests/kai-vcluster.yaml --connect=false &
# Team 2: Beta version
vcluster create team-beta --values manifests/kai-vcluster.yaml --connect=false &
# Wait for both to create
wait
kubectl config current-context | sed 's/^/CURRENT_CONTEXT: /'
# Team Stable: v0.7.11 (stable)
vcluster connect team-stable
helm upgrade -i kai-scheduler \
oci://ghcr.io/nvidia/kai-scheduler/kai-scheduler \
-n kai-scheduler --create-namespace \
--version v0.7.11 --wait &
STABLE_PID=$!
# Team Beta: v0.9.3 (testing new features)
vcluster connect team-beta
helm upgrade -i kai-scheduler \
oci://ghcr.io/nvidia/kai-scheduler/kai-scheduler \
-n kai-scheduler --create-namespace \
--version v0.9.3 --wait &
BETA_PID=$!
# Wait for both installations
wait $STABLE_PID $BETA_PID
vcluster disconnect
kubectl config current-context | sed 's/^/CURRENT_CONTEXT: /'
# Deploy to team-stable (30% + 50% GPU allocation)
vcluster connect team-stable
kubectl apply -f manifests/queues.yaml,manifests/gpu-demo-pod1.yaml,manifests/gpu-demo-pod2.yaml
vcluster disconnect
# Deploy to team-beta (different allocation strategy)
vcluster connect team-beta
kubectl apply -f manifests/queues.yaml,manifests/gpu-demo-pod1.yaml,manifests/gpu-demo-pod2.yaml
vcluster disconnect
vcluster list
Each vCluster pod contains:
Control Plane Components:
Typical Resource Usage:
You can inspect vCluster resources:
# View vCluster pods in host cluster
kubectl config use-context <your-gke-context>
kubectl get pods -A | grep vcluster
# Check resource usage (if metrics-server is installed)
kubectl top pod -A | grep vcluster
| Capability | Before vCluster | With vCluster | Time Saved | Risk Reduced |
|---|---|---|---|---|
| Test scheduler upgrades | 4 hours | 5 minutes | 98% | 100% → 0% |
| Rollback bad changes | 2 hours | 30 seconds | 99% | Critical → None |
| A/B test versions | Not possible | Easy | N/A | High → Zero |
| Per-team schedulers | Days | Minutes | 99% | Complex → Simple |
| GPU sharing validation | Weeks | Hours | 95% | High → None |
Challenge: Testing new KAI scheduler versions in production is risky.
Solution:
Challenge: ML Team needs KAI v0.9.3 features, Research Team requires stable v0.7.11.
Solution:
Challenge: Developers need to test KAI changes without impacting production.
Solution:
# Check scheduler logs
kubectl logs -n kai-scheduler -l app=kai-scheduler
# Check pod events
kubectl describe pod <pod-name>
# Verify GPU node labels
kubectl get nodes --show-labels | grep gpu
# List all vCluster contexts
vcluster list
# Reconnect to a vCluster
vcluster connect <vcluster-name>
# Disconnect and return to host context
vcluster disconnect
Ensure the NVIDIA RuntimeClass exists in your GKE cluster:
# Check RuntimeClass
kubectl get runtimeclass nvidia
# If missing, refer back to the GKE GPU setup blog
# Delete a specific vCluster
vcluster delete kai-isolated --delete-namespace
# Delete all demo vClusters
vcluster delete team-stable --delete-namespace
vcluster delete team-beta --delete-namespace
To avoid ongoing GCP charges, delete the entire GKE cluster when done:
# Get your cluster name and zone
gcloud container clusters list
# Delete the cluster (replace with your cluster name and zone)
gcloud container clusters delete <CLUSTER_NAME> --zone=<ZONE>
# Example:
# gcloud container clusters delete gpu-cluster --zone=us-central1-a
Warning: This will delete all workloads, vClusters, and data in the cluster. Make sure you’ve backed up anything important before proceeding.
# Verify cluster deletion
gcloud container clusters list
# Remove kubectl contexts (optional)
kubectl config get-contexts
kubectl config delete-context <context-name>
1. Pod submitted with schedulerName: kai-scheduler
2. vCluster's virtual scheduler sees pod
3. KAI scheduler (inside vCluster) makes decision
4. Pod scheduled to synced node (from host cluster)
5. Syncer translates pod to host cluster
6. Host cluster runs the pod on GPU node
From Host → vCluster:
From vCluster → Host:
Contributions are welcome! Please open an issue or submit a pull request.
This demo repository is provided as-is for educational purposes.