Cdi Datavolumes

CDI DataVolumes

Containerized Data Importer (or CDI for short), is a data import service for Kubernetes designed with KubeVirt in mind. Thanks to CDI, we can now enjoy the addition of DataVolumes, which greatly improve the workflow of managing KubeVirt and its storage.

What it does

DataVolumes are an abstraction of the Kubernetes resource, PVC (Persistent Volume Claim) and it also leverages other CDI features to ease the process of importing data into a Kubernetes cluster.

DataVolumes can be defined by themselves or embedded within a VirtualMachine resource definition, the first method can be used to orchestrate events based on the DataVolume status phases while the second eases the process of providing storage for a VM.

How does it work?

In this blog post, I’d like to focus on the second method, embedding the information within a VirtualMachine definition, which might seem like the most immediate benefit of this feature. Let’s get started!

Environment description

  • OpenShift

    For testing DataVolumes, I’ve spawned a new OpenShift cluster, using dynamic provisioning for storage running OpenShift Cloud Storage (GlusterFS), so the Persistent Volumes (PVs for short) are created on-demand. Other than that, it’s a regular OpenShift cluster, running with a single master (also used for infrastructure components) and two compute nodes.

  • CDI

    We also need CDI, of course, CDI can be deployed either together with KubeVirt or independently, the instructions can be found in the project’s GitHub repo.

  • KubeVirt

    Last but not least, we’ll need KubeVirt to run the VMs that will make use of the DataVolumes.

Enabling DataVolumes feature

As of this writing, DataVolumes have to be enabled through a feature gate, for KubeVirt, this is achieved by creating the kubevirt-config ConfigMap on the namespace where KubeVirt has been deployed, by default kube-system.

Let’s create the ConfigMap with the following definition:

---
apiVersion: v1
data:
  feature-gates: DataVolumes
kind: ConfigMap
metadata:
  name: kubevirt-config
  namespace: kube-system
$ oc create -f kubevirt-config-cm.yml

Alternatively, the following one-liner can also be used to achieve the same result:

$ oc create configmap kubevirt-config --from-literal feature-gates=DataVolumes -n kube-system

If the ConfigMap was already present on the system, just use oc edit to add the DataVolumes feature gate under the data field like the YAML above.

If everything went as expected, we should see the following log lines on the virt-controller pods:

level=info timestamp=2018-10-09T08:16:53.602400Z pos=application.go:173 component=virt-controller msg="DataVolume integration enabled"

NOTE: It’s worth noting the values in the ConfigMap are not dynamic, in the sense that virt-controller and virt-api will need to be restarted, scaling their deployments down and back up again, just remember to scale it up to the same number of replicas they previously had.

Creating a VirtualMachine embedding a DataVolume

Now that the cluster is ready to use the feature, let’s have a look at our VirtualMachine definition, which includes a DataVolume.

apiVersion: kubevirt.io/v1alpha2
kind: VirtualMachine
metadata:
  labels:
    kubevirt.io/vm: testvm1
  name: testvm1
spec:
  dataVolumeTemplates:
    - metadata:
        name: centos7-dv
      spec:
        pvc:
          accessModes:
          - ReadWriteOnce
          resources:
            requests:
              storage: 10Gi
        source:
          http:
            url: "https://cloud.centos.org/centos/7/images/CentOS-7-x86_64-GenericCloud.qcow2"
  running: true
  template:
    metadata:
      labels:
        kubevirt.io/vm: testvm1
    spec:
      domain:
        cpu:
          cores: 1
        devices:
          disks:
          - volumeName: test-datavolume
            name: disk0
            disk:
              bus: virtio
          - name: cloudinitdisk
            volumeName: cloudinitvolume
            cdrom:
              bus: virtio
        resources:
          requests:
            memory: 8Gi
      volumes:
      - dataVolume:
          name: centos7-dv
        name: test-datavolume
      - cloudInitNoCloud:
          userData: |
            #cloud-config
            hostname: testvm1
            users:
              - name: kubevirt
                gecos: KubeVirt Project
                sudo: ALL=(ALL) NOPASSWD:ALL
                passwd: $6$JXbc3063IJir.e5h$ypMlYScNMlUtvQ8Il1ldZi/mat7wXTiRioGx6TQmJjTVMandKqr.jJfe99.QckyfH/JJ.OdvLb5/OrCa8ftLr.
                shell: /bin/bash
                home: /home/kubevirt
                lock_passwd: false
        name: cloudinitvolume

The new addition to a regular VirtualMachine definition is the dataVolumeTemplates block, which will trigger the import of the CentOS-7 cloud image defined on the url field, storing it on a PV, the resulting DataVolume will be named centos7-dv, being referenced on the volumes section, it will serve as the boot disk (disk0) for our VirtualMachine.

Going ahead and applying the above manifest to our cluster results in the following set of events:

  • The DataVolume is created, triggering the creation of a PVC and therefore, using the dynamic provisioning configured on the cluster, a PV is provisioned to satisfy the needs of the PVC.
  • An importer pod is started, this pod is the one actually downloading the image defined in the url field and storing it on the provisioned PV.
  • Once the image has been downloaded and stored, the DataVolume status changes to Succeeded, from that point the virt launcher controller will go ahead and schedule the VirtualMachine.

Taking a look to the resources created after applying the VirtualMachine manifest, we can see the following:

$ oc get pods
NAME                          READY     STATUS      RESTARTS   AGE
importer-centos7-dv-t9zx2     0/1       Completed   0          11m
virt-launcher-testvm1-cpt8n   1/1       Running     0          8m

Let’s look at the importer pod logs to understand what it did:

$ oc logs importer-centos7-dv-t9zx2
I1009 12:37:45.384032       1 importer.go:32] Starting importer
I1009 12:37:45.393461       1 importer.go:37] begin import process
I1009 12:37:45.393519       1 dataStream.go:235] copying "https://cloud.centos.org/centos/7/images/CentOS-7-x86_64-GenericCloud.qcow2" to "/data/disk.img"...
I1009 12:37:45.393569       1 dataStream.go:112] IMPORTER_ACCESS_KEY_ID and/or IMPORTER_SECRET_KEY are empty
I1009 12:37:45.393606       1 dataStream.go:298] create the initial Reader based on the endpoint's "https" scheme
I1009 12:37:45.393665       1 dataStream.go:208] Attempting to get object "https://cloud.centos.org/centos/7/images/CentOS-7-x86_64-GenericCloud.qcow2" via http client
I1009 12:37:45.762330       1 dataStream.go:314] constructReaders: checking compression and archive formats: /centos/7/images/CentOS-7-x86_64-GenericCloud.qcow2
I1009 12:37:45.841564       1 dataStream.go:323] found header of type "qcow2"
I1009 12:37:45.841618       1 dataStream.go:338] constructReaders: no headers found for file "/centos/7/images/CentOS-7-x86_64-GenericCloud.qcow2"
I1009 12:37:45.841635       1 dataStream.go:340] done processing "/centos/7/images/CentOS-7-x86_64-GenericCloud.qcow2" headers
I1009 12:37:45.841650       1 dataStream.go:138] NewDataStream: endpoint "https://cloud.centos.org/centos/7/images/CentOS-7-x86_64-GenericCloud.qcow2"'s computed byte size: 8589934592
I1009 12:37:45.841698       1 dataStream.go:566] Validating qcow2 file
I1009 12:37:46.848736       1 dataStream.go:572] Doing streaming qcow2 to raw conversion
I1009 12:40:07.546308       1 importer.go:43] import complete

So, following the events we see, it fetched the image from the defined url, validated its format and converted it to raw for being used by qemu.

$ oc describe dv centos7-dv
Name:         centos7-dv
Namespace:    test-dv
Labels:       kubevirt.io/created-by=1916da5f-cbc0-11e8-b467-c81f666533c3
Annotations:  kubevirt.io/owned-by=virt-controller
API Version:  cdi.kubevirt.io/v1alpha1
Kind:         DataVolume
Metadata:
  Creation Timestamp:  2018-10-09T12:37:34Z
  Generation:          1
  Owner References:
    API Version:           kubevirt.io/v1alpha2
    Block Owner Deletion:  true
    Controller:            true
    Kind:                  VirtualMachine
    Name:                  testvm1
    UID:                   1916da5f-cbc0-11e8-b467-c81f666533c3
  Resource Version:        2474310
  Self Link:               /apis/cdi.kubevirt.io/v1alpha1/namespaces/test-dv/datavolumes/centos7-dv
  UID:                     19186b29-cbc0-11e8-b467-c81f666533c3
Spec:
  Pvc:
    Access Modes:
      ReadWriteOnce
    Resources:
      Requests:
        Storage:  10Gi
  Source:
    Http:
      URL:  https://cloud.centos.org/centos/7/images/CentOS-7-x86_64-GenericCloud.qcow2
Status:
  Phase:  Succeeded
Events:
  Type    Reason  Age                 From                   Message
  ----    ------  ----                ----                   -------
  Normal  Synced  29s (x13 over 14m)  datavolume-controller  DataVolume synced successfully
  Normal  Synced  18s                 datavolume-controller  DataVolume synced successfully

The DataVolume description matches what was defined under dataVolumeTemplates. Now, as we know it uses a PV/PVC underneath, let’s have a look:

$ oc describe pvc centos7-dv
Name:          centos7-dv
Namespace:     test-dv
StorageClass:  glusterfs-storage
Status:        Bound
Volume:        pvc-191d27c6-cbc0-11e8-b467-c81f666533c3
Labels:        app=containerized-data-importer
               cdi-controller=centos7-dv
Annotations:   cdi.kubevirt.io/storage.import.endpoint=https://cloud.centos.org/centos/7/images/CentOS-7-x86_64-GenericCloud.qcow2
               cdi.kubevirt.io/storage.import.importPodName=importer-centos7-dv-t9zx2
               cdi.kubevirt.io/storage.pod.phase=Succeeded
               pv.kubernetes.io/bind-completed=yes
               pv.kubernetes.io/bound-by-controller=yes
               volume.beta.kubernetes.io/storage-provisioner=kubernetes.io/glusterfs
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      10Gi
Access Modes:  RWO
Events:
  Type    Reason                 Age   From                         Message
  ----    ------                 ----  ----                         -------
  Normal  ProvisioningSucceeded  18m   persistentvolume-controller  Successfully provisioned volume pvc-191d27c6-cbc0-11e8-b467-c81f666533c3 using kubernetes.io/glusterfs

It’s important to pay attention to the annotations, these are monitored/set by CDI. CDI triggers an import when it detects the cdi.kubevirt.io/storage.import.endpoint, assigns a pod as the import task owner and updates the pod phase annotation.

At this point, everything is in place, the DataVolume has its underlying components, the image has been imported so now the VirtualMachine can start the VirtualMachineInstance based on its definition and using the CentOS7 image as boot disk, as users we can connect to its console as usual, for instance running the following command:

$ virtctl console testvm1

Cleaning it up

Once we’re happy with the results, it’s time to clean up all these tests. The task is easy:

$ oc delete vm testvm1

Once the VM (and its associated VMI) are gone, all the underlying storage resources are removed, there is no trace of the PVC, PV or DataVolume.

$ oc get dv centos7-dv
$ oc get pvc centos7-dv
$ oc get pv pvc-191d27c6-cbc0-11e8-b467-c81f666533c3

All three commands returned No resources found.