vcluster-kai-demo

vCluster + NVIDIA KAI Scheduler Demo

This repository demonstrates how to use vCluster with NVIDIA KAI Scheduler to achieve isolated, team-specific GPU scheduling on a single Kubernetes cluster. With vCluster’s virtual scheduler feature, different teams can run different KAI scheduler versions simultaneously without interfering with each other.

Prerequisites

MANDATORY: Before proceeding with this demo, you must set up a GPU-enabled Kubernetes cluster on GKE by following this blog post:

📖 How to Set Up a GPU‑Enabled Kubernetes Cluster on GKE by Hrittik Roy

This blog provides step-by-step instructions for:

Do not proceed until you have completed the GKE setup and verified that GPUs are accessible in your cluster.

What is NVIDIA KAI Scheduler?

KAI (Kubernetes AI) is an advanced Kubernetes scheduler designed for GPU workload optimization. Key features include:

Feature Benefit
Fractional GPU allocation Share a single GPU between multiple workloads (e.g., 0.5 training, 0.25 inference, 0.25 dev)
Queue-based scheduling Hierarchical resource management with fair sharing
Topology awareness Optimize scheduling decisions based on hardware layout
Fair sharing Prevent resource monopolization across teams

KAI was open-sourced in 2025, bringing enterprise-grade GPU management to the Kubernetes community.

What is vCluster?

vCluster creates isolated Kubernetes clusters that run inside a namespace of a host Kubernetes cluster. Think of it as “Kubernetes in Kubernetes” - each vCluster has its own control plane (API server, scheduler, syncer) but leverages the host cluster’s compute resources.

vCluster Virtual Scheduler

With the virtual scheduler feature enabled, vCluster can run custom schedulers (like KAI) completely isolated within each virtual cluster. This enables:

Demo Overview

This demo showcases:

  1. ✅ Deploying vCluster with virtual scheduler enabled
  2. ✅ Installing KAI scheduler inside a vCluster
  3. ✅ Configuring KAI queues for resource management
  4. ✅ Deploying GPU workloads with fractional GPU allocation
  5. ✅ Running multiple teams with different KAI versions simultaneously

What Actually Runs on GPUs?

Understanding typical GPU workloads helps contextualize KAI’s fractional allocation:

Workload Type Examples Typical GPU Usage
Model Training Fine-tuning LLMs, Deep Learning 100% for hours/days
Stable Diffusion Image generation ~50% GPU
LLM Inference ChatGPT API, Claude API 25-75% depending on model
Video Processing Transcoding, streaming Variable 20-80%
CUDA Development Jupyter notebooks, testing Often < 20%
Batch Processing Scientific computing Spikes to 100%

Setup Instructions

1. Clone This Repository

git clone <repository-url>
cd vcluster-kai-demo

2. Verify GPU Access

After completing the GKE setup, verify that your cluster has GPU access:

kubectl config current-context | sed 's/^/CURRENT_CONTEXT: /'
# Test GPU accessibility
kubectl run gpu-verify --image=nvidia/cuda:12.2.0-base-ubuntu20.04 \
  --rm -it --restart=Never \
  --overrides='{"spec":{"runtimeClassName":"nvidia","nodeSelector":{"nvidia.com/gpu.present":"true"}}}' \
  -- nvidia-smi -L

You should see output listing available GPUs (e.g., “GPU 0: NVIDIA Tesla T4”).

3. Install NVIDIA Device Plugin

Install the NVIDIA device plugin on your GKE cluster:

# Install NVIDIA device plugin
kubectl apply -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.17.4/deployments/static/nvidia-device-plugin.yml

# Wait for device plugin to be ready
kubectl wait --for=condition=ready pod -n kube-system \
  -l name=nvidia-device-plugin-ds --timeout=300s

4. Create RuntimeClass for NVIDIA Containers

Create a RuntimeClass that will be synced by vCluster:

kubectl apply -f - <<EOF
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
  name: nvidia
handler: nvidia
EOF

Verify the RuntimeClass was created:

kubectl get runtimeclass nvidia

5. Install vCluster CLI

If you haven’t already installed the vCluster CLI, follow the official installation guide.

6. Review Configuration Files

This repository includes pre-configured manifests in the manifests/ directory:

vCluster Configuration (manifests/kai-vcluster.yaml)

This file configures vCluster with the virtual scheduler enabled:

experimental:
  syncSettings:
    setOwner: false  # Required for KAI pod-grouper

controlPlane:
  advanced:
    virtualScheduler:
      enabled: true  # Runs scheduler inside vCluster

sync:
  fromHost:
    nodes:
      enabled: true  # Syncs host nodes for label detection
    runtimeClasses:
      enabled: true  # Syncs NVIDIA runtime
    # Auto-enabled with virtual scheduler:
    csiDrivers:
      enabled: auto
    csiNodes:
      enabled: auto
    csiStorageCapacities:
      enabled: auto

Key configuration highlights:

Virtual Scheduler Benefits:

KAI Queue Configuration (manifests/queues.yaml)

Defines two queues for hierarchical resource management:

Queues enable fair sharing and prevent resource monopolization.

GPU Demo Pods (manifests/gpu-demo-pod1.yaml and manifests/gpu-demo-pod2.yaml)

Two sample pods demonstrating fractional GPU allocation:

Both pods use:

Running the Demo

Single Team Deployment

Step 1: Create vCluster with Virtual Scheduler

# Create vCluster with virtual scheduler enabled
vcluster create kai-isolated --values manifests/kai-vcluster.yaml

This command:

Step 2: Install KAI Scheduler Inside vCluster

kubectl config current-context | sed 's/^/CURRENT_CONTEXT: /'
# Connect to vCluster first
vcluster connect kai-isolated

# Install KAI - it will be THE scheduler for this vCluster
KAI_VERSION=v0.7.11
helm upgrade -i kai-scheduler \
  oci://ghcr.io/nvidia/kai-scheduler/kai-scheduler \
  -n kai-scheduler --create-namespace \
  --version $KAI_VERSION \
  --set "global.gpuSharing=true"

kubectl wait --for=condition=ready pod -n kai-scheduler --all --timeout=120s

Step 3: Deploy GPU Workloads and Configure Queues

# Apply queues and deploy two pods with different GPU fractions
kubectl apply -f manifests/queues.yaml
kubectl apply -f manifests/gpu-demo-pod1.yaml
kubectl apply -f manifests/gpu-demo-pod2.yaml

kubectl wait --for=condition=ready pod -n default --timeout=120s

# Show both pods sharing the GPU
kubectl get pods -l app=gpu-demo -o custom-columns=NAME:.metadata.name,FRACTION:.metadata.annotations."kai\.scheduler/gpu-fraction",STATUS:.status.phase

Both pods should report seeing the same GPU, demonstrating fractional GPU sharing.

Multi-Team Deployment (Different KAI Versions)

This scenario demonstrates how different teams can run different KAI scheduler versions simultaneously.

Step 1: Disconnect and Clean Up Previous vCluster

kubectl config current-context | sed 's/^/CURRENT_CONTEXT: /'
# Disconnect from vCluster
vcluster disconnect

# Delete the entire vCluster (timed)
time vcluster delete kai-isolated --delete-namespace

Step 2: Create Multiple vClusters

kubectl config current-context | sed 's/^/CURRENT_CONTEXT: /'
# Create multiple vClusters for different teams using existing config
# Team 1: Stable version
vcluster create team-stable --values manifests/kai-vcluster.yaml --connect=false &

# Team 2: Beta version
vcluster create team-beta --values manifests/kai-vcluster.yaml --connect=false &

# Wait for both to create
wait

Step 3: Install Different KAI Versions

kubectl config current-context | sed 's/^/CURRENT_CONTEXT: /'
# Team Stable: v0.7.11 (stable)
vcluster connect team-stable
helm upgrade -i kai-scheduler \
  oci://ghcr.io/nvidia/kai-scheduler/kai-scheduler \
  -n kai-scheduler --create-namespace \
  --version v0.7.11 --wait &
STABLE_PID=$!

# Team Beta: v0.9.3 (testing new features)
vcluster connect team-beta
helm upgrade -i kai-scheduler \
  oci://ghcr.io/nvidia/kai-scheduler/kai-scheduler \
  -n kai-scheduler --create-namespace \
  --version v0.9.3 --wait &
BETA_PID=$!

# Wait for both installations
wait $STABLE_PID $BETA_PID

vcluster disconnect

Step 4: Deploy Workloads to Both Teams

kubectl config current-context | sed 's/^/CURRENT_CONTEXT: /'
# Deploy to team-stable (30% + 50% GPU allocation)
vcluster connect team-stable
kubectl apply -f manifests/queues.yaml,manifests/gpu-demo-pod1.yaml,manifests/gpu-demo-pod2.yaml
vcluster disconnect

# Deploy to team-beta (different allocation strategy)
vcluster connect team-beta
kubectl apply -f manifests/queues.yaml,manifests/gpu-demo-pod1.yaml,manifests/gpu-demo-pod2.yaml
vcluster disconnect

Step 5: Verify Parallel Operations

vcluster list

Understanding vCluster Resource Footprint

Each vCluster pod contains:

Control Plane Components:

Typical Resource Usage:

You can inspect vCluster resources:

# View vCluster pods in host cluster
kubectl config use-context <your-gke-context>
kubectl get pods -A | grep vcluster

# Check resource usage (if metrics-server is installed)
kubectl top pod -A | grep vcluster

Operational Benefits

Capability Before vCluster With vCluster Time Saved Risk Reduced
Test scheduler upgrades 4 hours 5 minutes 98% 100% → 0%
Rollback bad changes 2 hours 30 seconds 99% Critical → None
A/B test versions Not possible Easy N/A High → Zero
Per-team schedulers Days Minutes 99% Complex → Simple
GPU sharing validation Weeks Hours 95% High → None

Real-World Use Cases

Use Case 1: Safe Scheduler Upgrades

Challenge: Testing new KAI scheduler versions in production is risky.

Solution:

  1. Create a vCluster with the new KAI version
  2. Deploy representative workloads
  3. Validate behavior over days/weeks
  4. If successful, upgrade main cluster
  5. If failed, delete vCluster (30 seconds)

Use Case 2: Multi-Team Independence

Challenge: ML Team needs KAI v0.9.3 features, Research Team requires stable v0.7.11.

Solution:

Use Case 3: Development and Testing

Challenge: Developers need to test KAI changes without impacting production.

Solution:

Troubleshooting

Pods Stuck in Pending

# Check scheduler logs
kubectl logs -n kai-scheduler -l app=kai-scheduler

# Check pod events
kubectl describe pod <pod-name>

# Verify GPU node labels
kubectl get nodes --show-labels | grep gpu

vCluster Connection Issues

# List all vCluster contexts
vcluster list

# Reconnect to a vCluster
vcluster connect <vcluster-name>

# Disconnect and return to host context
vcluster disconnect

GPU RuntimeClass Not Found

Ensure the NVIDIA RuntimeClass exists in your GKE cluster:

# Check RuntimeClass
kubectl get runtimeclass nvidia

# If missing, refer back to the GKE GPU setup blog

Cleanup

Delete Individual vClusters

# Delete a specific vCluster
vcluster delete kai-isolated --delete-namespace

# Delete all demo vClusters
vcluster delete team-stable --delete-namespace
vcluster delete team-beta --delete-namespace

Tear Down GKE Cluster

To avoid ongoing GCP charges, delete the entire GKE cluster when done:

# Get your cluster name and zone
gcloud container clusters list

# Delete the cluster (replace with your cluster name and zone)
gcloud container clusters delete <CLUSTER_NAME> --zone=<ZONE>

# Example:
# gcloud container clusters delete gpu-cluster --zone=us-central1-a

Warning: This will delete all workloads, vClusters, and data in the cluster. Make sure you’ve backed up anything important before proceeding.

Verify Cleanup

# Verify cluster deletion
gcloud container clusters list

# Remove kubectl contexts (optional)
kubectl config get-contexts
kubectl config delete-context <context-name>

Key Takeaways

  1. vCluster Virtual Scheduler enables true scheduling isolation per team
  2. KAI Scheduler provides fractional GPU allocation and queue-based resource management
  3. Zero host impact - each vCluster makes independent scheduling decisions
  4. Fast iteration - create, test, and delete scheduler versions in minutes
  5. Production-safe - test new features without risking existing workloads

Architecture Highlights

Scheduling Flow with vCluster + KAI

1. Pod submitted with schedulerName: kai-scheduler
2. vCluster's virtual scheduler sees pod
3. KAI scheduler (inside vCluster) makes decision
4. Pod scheduled to synced node (from host cluster)
5. Syncer translates pod to host cluster
6. Host cluster runs the pod on GPU node

Resource Synchronization

From Host → vCluster:

From vCluster → Host:

Additional Resources

Contributing

Contributions are welcome! Please open an issue or submit a pull request.

License

This demo repository is provided as-is for educational purposes.