vcluster-kai-demo

vCluster + NVIDIA KAI Scheduler Demo

This repository demonstrates how to use vCluster with NVIDIA KAI Scheduler to achieve isolated, team-specific GPU scheduling on a single Kubernetes cluster. With vCluster’s virtual scheduler feature, different teams can run different KAI scheduler versions simultaneously without interfering with each other.

Prerequisites

MANDATORY: Before proceeding with this demo, you must set up a GPU-enabled Kubernetes cluster on GKE by following this blog post:

📖 How to Set Up a GPU‑Enabled Kubernetes Cluster on GKE by Hrittik Roy

This blog provides step-by-step instructions for:

Creating a GKE cluster with GPU node pools
Installing NVIDIA GPU drivers and device plugins
Configuring RuntimeClass for GPU workloads
Verifying GPU availability

Do not proceed until you have completed the GKE setup and verified that GPUs are accessible in your cluster.

What is NVIDIA KAI Scheduler?

KAI (Kubernetes AI) is an advanced Kubernetes scheduler designed for GPU workload optimization. Key features include:

Feature	Benefit
Fractional GPU allocation	Share a single GPU between multiple workloads (e.g., 0.5 training, 0.25 inference, 0.25 dev)
Queue-based scheduling	Hierarchical resource management with fair sharing
Topology awareness	Optimize scheduling decisions based on hardware layout
Fair sharing	Prevent resource monopolization across teams

KAI was open-sourced in 2025, bringing enterprise-grade GPU management to the Kubernetes community.

What is vCluster?

vCluster creates isolated Kubernetes clusters that run inside a namespace of a host Kubernetes cluster. Think of it as “Kubernetes in Kubernetes” - each vCluster has its own control plane (API server, scheduler, syncer) but leverages the host cluster’s compute resources.

vCluster Virtual Scheduler

With the virtual scheduler feature enabled, vCluster can run custom schedulers (like KAI) completely isolated within each virtual cluster. This enables:

Independent scheduler versions per team (e.g., Team A uses KAI v0.7.11, Team B uses v0.9.3)
Zero host cluster impact - scheduling decisions remain within the vCluster
Safe testing - try new scheduler versions without risking production
Fast rollback - delete a vCluster in ~30 seconds if something goes wrong

Demo Overview

This demo showcases:

✅ Deploying vCluster with virtual scheduler enabled
✅ Installing KAI scheduler inside a vCluster
✅ Configuring KAI queues for resource management
✅ Deploying GPU workloads with fractional GPU allocation
✅ Running multiple teams with different KAI versions simultaneously

What Actually Runs on GPUs?

Understanding typical GPU workloads helps contextualize KAI’s fractional allocation:

Workload Type	Examples	Typical GPU Usage
Model Training	Fine-tuning LLMs, Deep Learning	100% for hours/days
Stable Diffusion	Image generation	~50% GPU
LLM Inference	ChatGPT API, Claude API	25-75% depending on model
Video Processing	Transcoding, streaming	Variable 20-80%
CUDA Development	Jupyter notebooks, testing	Often < 20%
Batch Processing	Scientific computing	Spikes to 100%

Setup Instructions

1. Clone This Repository

git clone <repository-url>
cd vcluster-kai-demo

2. Verify GPU Access

After completing the GKE setup, verify that your cluster has GPU access:

kubectl config current-context | sed 's/^/CURRENT_CONTEXT: /'

# Test GPU accessibility
kubectl run gpu-verify --image=nvidia/cuda:12.2.0-base-ubuntu20.04 \
  --rm -it --restart=Never \
  --overrides='{"spec":{"runtimeClassName":"nvidia","nodeSelector":{"nvidia.com/gpu.present":"true"}}}' \
  -- nvidia-smi -L

You should see output listing available GPUs (e.g., “GPU 0: NVIDIA Tesla T4”).

3. Install NVIDIA Device Plugin

Install the NVIDIA device plugin on your GKE cluster:

# Install NVIDIA device plugin
kubectl apply -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.17.4/deployments/static/nvidia-device-plugin.yml

# Wait for device plugin to be ready
kubectl wait --for=condition=ready pod -n kube-system \
  -l name=nvidia-device-plugin-ds --timeout=300s

4. Create RuntimeClass for NVIDIA Containers

Create a RuntimeClass that will be synced by vCluster:

kubectl apply -f - <<EOF
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
  name: nvidia
handler: nvidia
EOF

Verify the RuntimeClass was created:

kubectl get runtimeclass nvidia

5. Install vCluster CLI

If you haven’t already installed the vCluster CLI, follow the official installation guide.

6. Review Configuration Files

This repository includes pre-configured manifests in the manifests/ directory:

vCluster Configuration (`manifests/kai-vcluster.yaml`)

This file configures vCluster with the virtual scheduler enabled:

experimental:
  syncSettings:
    setOwner: false  # Required for KAI pod-grouper

controlPlane:
  advanced:
    virtualScheduler:
      enabled: true  # Runs scheduler inside vCluster

sync:
  fromHost:
    nodes:
      enabled: true  # Syncs host nodes for label detection
    runtimeClasses:
      enabled: true  # Syncs NVIDIA runtime
    # Auto-enabled with virtual scheduler:
    csiDrivers:
      enabled: auto
    csiNodes:
      enabled: auto
    csiStorageCapacities:
      enabled: auto

Key configuration highlights:

virtualScheduler.enabled: true - Runs scheduler inside vCluster
syncSettings.setOwner: false - Required for KAI pod-grouper
Syncs nodes, runtimeClasses, and CSI resources from host cluster

Virtual Scheduler Benefits:

Independent KAI versions per team
Complete scheduler isolation
True scheduling autonomy
No cross-team interference

KAI Queue Configuration (`manifests/queues.yaml`)

Defines two queues for hierarchical resource management:

default - Parent queue with unlimited quotas
test - Child queue inheriting from default

Queues enable fair sharing and prevent resource monopolization.

GPU Demo Pods (`manifests/gpu-demo-pod1.yaml` and `manifests/gpu-demo-pod2.yaml`)

Two sample pods demonstrating fractional GPU allocation:

gpu-demo-1: Requests 10% GPU (0.1 fraction)
gpu-demo-2: Requests 20% GPU (0.2 fraction)

Both pods use:

schedulerName: kai-scheduler - Use KAI for scheduling
runtimeClassName: nvidia - Use NVIDIA container runtime
nodeSelector - Target GPU-enabled nodes

Running the Demo

Single Team Deployment

Step 1: Create vCluster with Virtual Scheduler

# Create vCluster with virtual scheduler enabled
vcluster create kai-isolated --values manifests/kai-vcluster.yaml

This command:

Creates a new vCluster named kai-isolated
Enables the virtual scheduler
Syncs necessary resources from the host (nodes, runtimeClasses)
Automatically connects your kubectl context to the vCluster

Step 2: Install KAI Scheduler Inside vCluster

kubectl config current-context | sed 's/^/CURRENT_CONTEXT: /'

# Connect to vCluster first
vcluster connect kai-isolated

# Install KAI - it will be THE scheduler for this vCluster
KAI_VERSION=v0.7.11
helm upgrade -i kai-scheduler \
  oci://ghcr.io/nvidia/kai-scheduler/kai-scheduler \
  -n kai-scheduler --create-namespace \
  --version $KAI_VERSION \
  --set "global.gpuSharing=true"

kubectl wait --for=condition=ready pod -n kai-scheduler --all --timeout=120s

Step 3: Deploy GPU Workloads and Configure Queues

# Apply queues and deploy two pods with different GPU fractions
kubectl apply -f manifests/queues.yaml
kubectl apply -f manifests/gpu-demo-pod1.yaml
kubectl apply -f manifests/gpu-demo-pod2.yaml

kubectl wait --for=condition=ready pod -n default --timeout=120s

# Show both pods sharing the GPU
kubectl get pods -l app=gpu-demo -o custom-columns=NAME:.metadata.name,FRACTION:.metadata.annotations."kai\.scheduler/gpu-fraction",STATUS:.status.phase

Both pods should report seeing the same GPU, demonstrating fractional GPU sharing.

Multi-Team Deployment (Different KAI Versions)

This scenario demonstrates how different teams can run different KAI scheduler versions simultaneously.

Step 1: Disconnect and Clean Up Previous vCluster

kubectl config current-context | sed 's/^/CURRENT_CONTEXT: /'

# Disconnect from vCluster
vcluster disconnect

# Delete the entire vCluster (timed)
time vcluster delete kai-isolated --delete-namespace

Step 2: Create Multiple vClusters

kubectl config current-context | sed 's/^/CURRENT_CONTEXT: /'

# Create multiple vClusters for different teams using existing config
# Team 1: Stable version
vcluster create team-stable --values manifests/kai-vcluster.yaml --connect=false &

# Team 2: Beta version
vcluster create team-beta --values manifests/kai-vcluster.yaml --connect=false &

# Wait for both to create
wait

Step 3: Install Different KAI Versions

kubectl config current-context | sed 's/^/CURRENT_CONTEXT: /'

# Team Stable: v0.7.11 (stable)
vcluster connect team-stable
helm upgrade -i kai-scheduler \
  oci://ghcr.io/nvidia/kai-scheduler/kai-scheduler \
  -n kai-scheduler --create-namespace \
  --version v0.7.11 --wait &
STABLE_PID=$!

# Team Beta: v0.9.3 (testing new features)
vcluster connect team-beta
helm upgrade -i kai-scheduler \
  oci://ghcr.io/nvidia/kai-scheduler/kai-scheduler \
  -n kai-scheduler --create-namespace \
  --version v0.9.3 --wait &
BETA_PID=$!

# Wait for both installations
wait $STABLE_PID $BETA_PID

vcluster disconnect

Step 4: Deploy Workloads to Both Teams

kubectl config current-context | sed 's/^/CURRENT_CONTEXT: /'

# Deploy to team-stable (30% + 50% GPU allocation)
vcluster connect team-stable
kubectl apply -f manifests/queues.yaml,manifests/gpu-demo-pod1.yaml,manifests/gpu-demo-pod2.yaml
vcluster disconnect

# Deploy to team-beta (different allocation strategy)
vcluster connect team-beta
kubectl apply -f manifests/queues.yaml,manifests/gpu-demo-pod1.yaml,manifests/gpu-demo-pod2.yaml
vcluster disconnect

Step 5: Verify Parallel Operations

vcluster list

Understanding vCluster Resource Footprint

Each vCluster pod contains:

Control Plane Components:

API Server (k8s): Handles all Kubernetes API calls
Syncer: Bi-directional resource synchronization with host cluster
SQLite/etcd: Complete state isolation (typically 10-50MB)
Scheduler (optional): Independent scheduling decisions

Typical Resource Usage:

CPU: 100-200m per vCluster
Memory: 200-400Mi per vCluster
Storage: 10-50MB (SQLite database)

You can inspect vCluster resources:

# View vCluster pods in host cluster
kubectl config use-context <your-gke-context>
kubectl get pods -A | grep vcluster

# Check resource usage (if metrics-server is installed)
kubectl top pod -A | grep vcluster

Operational Benefits

Capability	Before vCluster	With vCluster	Time Saved	Risk Reduced
Test scheduler upgrades	4 hours	5 minutes	98%	100% → 0%
Rollback bad changes	2 hours	30 seconds	99%	Critical → None
A/B test versions	Not possible	Easy	N/A	High → Zero
Per-team schedulers	Days	Minutes	99%	Complex → Simple
GPU sharing validation	Weeks	Hours	95%	High → None

Real-World Use Cases

Use Case 1: Safe Scheduler Upgrades

Challenge: Testing new KAI scheduler versions in production is risky.

Solution:

Create a vCluster with the new KAI version
Deploy representative workloads
Validate behavior over days/weeks
If successful, upgrade main cluster
If failed, delete vCluster (30 seconds)

Use Case 2: Multi-Team Independence

Challenge: ML Team needs KAI v0.9.3 features, Research Team requires stable v0.7.11.

Solution:

Each team gets a dedicated vCluster
Teams control their own scheduler versions
Zero cross-team interference
Teams upgrade independently

Use Case 3: Development and Testing

Challenge: Developers need to test KAI changes without impacting production.

Solution:

Create dev vClusters for experimentation
Test GPU scheduling logic
Validate queue configurations
No risk to production workloads

Troubleshooting

Pods Stuck in Pending

# Check scheduler logs
kubectl logs -n kai-scheduler -l app=kai-scheduler

# Check pod events
kubectl describe pod <pod-name>

# Verify GPU node labels
kubectl get nodes --show-labels | grep gpu

vCluster Connection Issues

# List all vCluster contexts
vcluster list

# Reconnect to a vCluster
vcluster connect <vcluster-name>

# Disconnect and return to host context
vcluster disconnect

GPU RuntimeClass Not Found

Ensure the NVIDIA RuntimeClass exists in your GKE cluster:

# Check RuntimeClass
kubectl get runtimeclass nvidia

# If missing, refer back to the GKE GPU setup blog

Cleanup

Delete Individual vClusters

# Delete a specific vCluster
vcluster delete kai-isolated --delete-namespace

# Delete all demo vClusters
vcluster delete team-stable --delete-namespace
vcluster delete team-beta --delete-namespace

Tear Down GKE Cluster

To avoid ongoing GCP charges, delete the entire GKE cluster when done:

# Get your cluster name and zone
gcloud container clusters list

# Delete the cluster (replace with your cluster name and zone)
gcloud container clusters delete <CLUSTER_NAME> --zone=<ZONE>

# Example:
# gcloud container clusters delete gpu-cluster --zone=us-central1-a

Warning: This will delete all workloads, vClusters, and data in the cluster. Make sure you’ve backed up anything important before proceeding.

Verify Cleanup

# Verify cluster deletion
gcloud container clusters list

# Remove kubectl contexts (optional)
kubectl config get-contexts
kubectl config delete-context <context-name>

Key Takeaways

vCluster Virtual Scheduler enables true scheduling isolation per team
KAI Scheduler provides fractional GPU allocation and queue-based resource management
Zero host impact - each vCluster makes independent scheduling decisions
Fast iteration - create, test, and delete scheduler versions in minutes
Production-safe - test new features without risking existing workloads

Architecture Highlights

Scheduling Flow with vCluster + KAI

Pod submitted with schedulerName: kai-scheduler
vCluster's virtual scheduler sees pod
KAI scheduler (inside vCluster) makes decision
Pod scheduled to synced node (from host cluster)
Syncer translates pod to host cluster
Host cluster runs the pod on GPU node

Resource Synchronization

From Host → vCluster:

Nodes (with GPU labels)
RuntimeClasses (NVIDIA)
CSI resources (for storage)

From vCluster → Host:

Scheduled pods
Services (if enabled)
PersistentVolumeClaims

Additional Resources

vCluster Documentation: https://vcluster.com/docs
KAI Scheduler GitHub: https://github.com/NVIDIA/KAI-Scheduler
vCluster + KAI Integration: https://docs.vcluster.com/third-party-integrations/scheduler/kai-scheduler
vCluster Community Slack: https://slack.loft.sh
Office Hours: https://www.loft.sh/events

Contributing

Contributions are welcome! Please open an issue or submit a pull request.

License

This demo repository is provided as-is for educational purposes.

vcluster-kai-demo

vCluster + NVIDIA KAI Scheduler Demo

Prerequisites

What is NVIDIA KAI Scheduler?

What is vCluster?

vCluster Virtual Scheduler

Demo Overview

What Actually Runs on GPUs?

Setup Instructions

1. Clone This Repository

2. Verify GPU Access

3. Install NVIDIA Device Plugin

4. Create RuntimeClass for NVIDIA Containers

5. Install vCluster CLI

6. Review Configuration Files

vCluster Configuration (manifests/kai-vcluster.yaml)

KAI Queue Configuration (manifests/queues.yaml)

GPU Demo Pods (manifests/gpu-demo-pod1.yaml and manifests/gpu-demo-pod2.yaml)

Running the Demo

Single Team Deployment

Step 1: Create vCluster with Virtual Scheduler

Step 2: Install KAI Scheduler Inside vCluster

Step 3: Deploy GPU Workloads and Configure Queues

Multi-Team Deployment (Different KAI Versions)

Step 1: Disconnect and Clean Up Previous vCluster

Step 2: Create Multiple vClusters

Step 3: Install Different KAI Versions

Step 4: Deploy Workloads to Both Teams

Step 5: Verify Parallel Operations

Understanding vCluster Resource Footprint

Operational Benefits

Real-World Use Cases

Use Case 1: Safe Scheduler Upgrades

Use Case 2: Multi-Team Independence

Use Case 3: Development and Testing

Troubleshooting

Pods Stuck in Pending

vCluster Connection Issues

GPU RuntimeClass Not Found

Cleanup

Delete Individual vClusters

Tear Down GKE Cluster

Verify Cleanup

Key Takeaways

Architecture Highlights

Scheduling Flow with vCluster + KAI

Resource Synchronization

Additional Resources

Contributing

License

vCluster Configuration (`manifests/kai-vcluster.yaml`)

KAI Queue Configuration (`manifests/queues.yaml`)

GPU Demo Pods (`manifests/gpu-demo-pod1.yaml` and `manifests/gpu-demo-pod2.yaml`)