Assigning GPUs with Dynamic Resource Allocation (DRA)¶
FEATURE STATE: v1.9.0: Beta
KubeVirt can attach GPUs to virtual machines through Kubernetes Dynamic Resource Allocation (DRA).
With DRA, a GPU is requested through the Kubernetes ResourceClaim API, allocated by an external GPU DRA driver, and attached to the VM through spec.resourceClaims and spec.domain.devices.gpus.
This is an alternative to the device-plugin flow documented in Host Devices Assignment.
With DRA, you do not need to allowlist GPUs in permittedHostDevices.
Instead, the DRA driver publishes ResourceSlice objects and KubeVirt reads device metadata from files mounted into virt-launcher.
Note
Throughout this guide, Container Device Interface (CDI) refers to the Kubernetes mechanism that DRA drivers use to inject device nodes into virt-launcher pods — not Containerized Data Importer, which KubeVirt uses for disk import workflows.
GPUsWithDRA feature gate is in Beta and is enabled by default starting in v1.9.0.
The only earlier release that supports it is v1.8.x, where you must enable it manually.
See Activating feature gates to enable it on earlier releases or to disable it.
Prerequisites¶
-
GPU DRA driver. Install a compatible external GPU DRA driver in the cluster.
This guide documents the NVIDIA k8s-dra-driver-gpu for production clusters.
The driver must publish DRA device metadata (KEP-5304) — device attributes such as
resource.kubernetes.io/pciBusID— in JSON files mounted into virt-launcher. KubeVirt reads these files to build the libvirt domain XML for the GPU. This is separate from the VFIO device nodes that CDI mounts at runtime: CDI delivers the/dev/vfio/*device nodes; the metadata file identifies which PCI device to attach. Driver authors can find details on publishing these attributes in Access DRA device metadata. -
Host preparation for passthrough GPUs. For passthrough allocations, GPU nodes must meet the usual PCI passthrough requirements (IOMMU enabled,
vfio-pciavailable). See Host Devices Assignment.
Overview¶
The high-level workflow is:
- Install a GPU DRA driver on the cluster.
- Create a
ResourceClaimTemplatethat requests a GPU from the driver. - Create a VMI that references the claim and maps it to
spec.domain.devices.gpus.
Installing the DRA GPU driver¶
DRA driver for NVIDIA GPUs¶
Install the dra-driver-nvidia-gpu.
For installation details, see the installation guide.
For enabling GPU passthrough support in the DRA driver, see the KubeVirt VFIO GPU passthrough guide.
Creating a ResourceClaimTemplate¶
Create a ResourceClaimTemplate that requests a GPU from the DRA driver.
Replace vfio.gpu.nvidia.com with the DeviceClass name published by your driver.
When using the DRA driver for NVIDIA GPUs for KubeVirt passthrough, include the opaque VfioDeviceConfig block below.
The NVIDIA driver does not inject the VFIO API device by default.
See the VfioDeviceConfig parameters in the driver guide for more details.
apiVersion: resource.k8s.io/v1
kind: ResourceClaimTemplate
metadata:
name: dra-gpu-claim-template
spec:
spec:
devices:
config:
- requests:
- dra-gpu
opaque:
driver: gpu.nvidia.com
parameters:
apiVersion: resource.nvidia.com/v1beta1
kind: VfioDeviceConfig
iommu:
backendPolicy: LegacyOnly
enableAPIDevice: true
requests:
- name: dra-gpu
exactly:
allocationMode: ExactCount
count: 1
deviceClassName: vfio.gpu.nvidia.com
Creating a VMI with a DRA GPU¶
Reference the claim template in spec.resourceClaims, then map the allocated GPU to spec.domain.devices.gpus using claimName and requestName.
apiVersion: kubevirt.io/v1
kind: VirtualMachineInstance
metadata:
name: vm-dra-gpu
labels:
app: vm-dra-gpu-test
spec:
resourceClaims:
- name: dra-gpu-claim
resourceClaimTemplateName: dra-gpu-claim-template
domain:
resources:
requests:
memory: 128Mi
cpu:
cores: 1
devices:
disks:
- name: containerdisk
disk:
bus: virtio
interfaces:
- name: default
masquerade: {}
gpus:
- name: gpu0
claimName: dra-gpu-claim
requestName: dra-gpu
networks:
- name: default
pod: {}
volumes:
- name: containerdisk
containerDisk:
image: quay.io/kubevirt/cirros-container-disk-demo:latest
Important
These fields must reference each other:
spec.resourceClaims[].resourceClaimTemplateNamemust match themetadata.nameof theResourceClaimTemplatecreated in step 2 (dra-gpu-claim-templatein this example).spec.domain.devices.gpus[].claimNamemust matchspec.resourceClaims[].namefor the claim that backs the device (dra-gpu-claimin this example).spec.domain.devices.gpus[].requestNamemust match the requestnamedefined in theResourceClaimTemplate(dra-gpuin this example).
When the VMI starts, virt-launcher uses the metadata file for attributes such as resource.kubernetes.io/pciBusID and the CDI-mounted /dev/vfio/* device nodes to configure GPU passthrough in the libvirt domain XML.