KubeVirt Summit is our annual online conference, now in its fourth year, in which the entire broader community meets to showcase technical architecture, new features, proposed changes, and in-depth tutorials. We have two tracks to cater for developer talks, and another for end users to share their deployment journey with KubeVirt and their use case(s) at scale. And there’s no reason why a talk can’t be both :)
The event will take place online over two half-days:
Do consider proposing a session, and help make our fourth Summit as valuable as possible. We welcome a range of session types, any of which can be simple and intended for beginners or face-meltingly technical. Check out last year’s talks for some ideas.
Reach out on our virtualization Slack channel (in the Kubernetes workspace).
Connect with the KubeVirt Community through our mailing list, slack channels, weekly meetings, and more, all list in our community repo.
Good luck!
]]>Released on: Tue Mar 5 20:25:04 2024 +0000
kubevirt.io/ksm-enabled
node label to true if the ksm is managed by KubeVirt, instead of reflect the actual ksm value.vmRolloutStrategy
setting to define whether changes to VMs should either be always staged or live-updated when possible.kubevirt.io:default
clusterRole to get,list kubevirtsStatus.GuestOSInfo.Version
Machine
You can read the full v1.1 release notes here, but we’ve asked the KubeVirt SIGs to summarize their largest successes, as well as one of the community members from Arm to list their integration accomplishments for this release.
SIG-compute covers the core functionality of KubeVirt. This includes scheduling VMs, the API, and all KubeVirt operators.
For the v1.1 release, we have added quite a few features. This includes memory hotplug, as a follow up to CPU hotplug, which was part of the 1.0 release. Basic KSM support was already part of KubeVirt, but we have now extended that with more tuning parameters and KubeVirt can also dynamically configure KSM based on system pressure. We’ve added persistent NVRAM support (requires that a VM use UEFI) so that settings are preserved across reboots.
We’ve also added host-side USB passthrough support, so that USB devices on a cluster node can be made available to workloads. KubeVirt can now automatically apply limits to a VM running in a namespace with quotas. We’ve also added refinements to VM cloning, as well as the ability to create clones using the virtctl CLI tool. And you can now stream guest’s console logs.
Finally, on the confidential computing front, we now have an API for SEV attestation.
SIG-infra takes care of KubeVirt’s own infrastructure, user workloads and other user-focused integrations through automation and the reduction of complexity wherever possible, providing a quality experience for end users.
In this release, two major instance type-related features were added to KubeVirt. The first feature is the deployment of Common InstanceTypes by the virt-operator. This provides users with a useful set of InstanceTypes and Preferences right out of the box and allows them to easily create virtual machines tailored to the needs of their workloads. For now this feature remains behind a feature gate, but in future versions we aim to enable the deployment by default.
Secondly, the inference of InstanceTypes and Preferences has been enabled by default when creating virtual machines with virtctl. This feature was already present in the previous release, but users still needed to explicitly enable it. Now it is enabled by default, being as transparent as possible so as to not let the creation of virtual machines fail if inference should not be possible. This significantly improves usability, as the command line for creating virtual machines is now even simpler.
SIG-network is committed to enhancing and maintaining all aspects of Virtual Machine network connectivity and management in KubeVirt.
For the v1.1 release, we have re-designed the interface hot plug/unplug API, while adding hotplug support for SR-IOV interfaces. On top of that, we have added a network binding option allowing the community to extend the KubeVirt network configuration in the pod by injecting custom CNI plugins to configure the networking stack, and a sidecar to configure the libvirt domain. The existing slirp
network configuration has been extracted from the code and re-designed as one such network binding, and can be used by the community as an example on how to extend KubeVirt bindings.
SIG-scale continues to track scale and performance across releases. The v1.1 testing lanes ran on Kubernetes 1.27 and we observed a slight performance improvement from Kubernetes. There’s no other notable performance or scale changes in KubeVirt v1.1 as our focus has been on improving our tracking.
Full v1.1 data source: https://github.com/kubevirt/kubevirt/blob/main/docs/perf-scale-benchmarks.md
SIG-storage is focused on providing persistent storage to KubeVirt VMs and managing that storage throughout the lifecycle of the VM. This begins with provisioning and populating PVCs with bootable images but also includes features such as disk hotplug, snapshots, backup and restore, disaster recovery, and virtual machine export.
For this release we aimed to draw closer to Kubernetes principles when it comes to managing storage artifacts. Introducing CDI volume populators, which is CDI’s implementation of importing/uploading/cloning data to PVCs using the dataSourceRef
field. This follows the Kubernetes way of populating PVCs and enables us to populate PVCs directly without the need for DataVolumes, an important but bespoke object that has served the KubeVirt use case for many years.
Speaking of DataVolumes, they will no longer be garbage collected by default, something that violated a fundamental principle of Kubernetes (even though it was very useful for our use case).
And, finally, we can now use snapshots to store operating system “golden images”, to serve as the base image for cloning.
We are excited to announce the successful integration of KubeVirt on Arm64 platforms. Here are some key accomplishments:
We are thrilled to declare that KubeVirt now offers tier-one support on Arm64 platforms. This milestone represents a culmination of collaborative efforts, unwavering dedication, and a commitment to innovation within the KubeVirt community. KubeVirt is no longer just an option; it has evolved to become a first-class citizen on Arm64 platforms.
Thank you to everyone in the KubeVirt Community who contributed to this release, whether you pitched in on any of the features listed above, helped out with any of the other features or maintenance improvements listed in our release notes, or made any number of non-code contributions to our website, user guide or meetings.
]]>Released on: Mon Nov 6 16:28:56 2023 +0000
AutoResourceLimits
FeatureGate is enabledcommon-instancetypes
resources can now deployed by virt-operator
using the CommonInstancetypesDeploymentGate
feature gate.instancetype.kubevirt.io:view
ClusterRole
has been introduced that can be bound to users via a ClusterRoleBinding
to provide read only access to the cluster scoped VirtualMachineCluster{Instancetype,Preference}
resources.list
and watch
verbs from virt-controller’s RBACquay.io/kubevirt/network-slirp-binding:20230830_638c60fc8
. On next release (v1.2.0) no default image will be set and registering an image would be mandatory.kubevirt.io/schedulable
label when finding lowest TSC frequency on the clusterspec.config.machineType
in KubeVirt CR.ControllerRevisions
containing instancetype.kubevirt.io
CRDs
are now decorated with labels detailing specific metadata of the underlying stashed objectvirtctl create clone
marshalling and replacement of kubectl
with kubectl virt
nodeSelector
and schedulerName
fields have been added to VirtualMachineInstancetype spec.For this article, we’ll learn about the process of setting up KubeVirt with Cluster Autoscaler on EKS. In addition, we’ll be using bare metal nodes to host KubeVirt VMs.
This article will talk about how to make various software systems work together but introducing each one in detail is outside of its scope. Thus, you must already:
All the code used in this article may also be found at github.com/relaxdiego/kubevirt-cas-baremetal.
First let’s set some environment variables:
# The name of the EKS cluster we're going to create
export RD_CLUSTER_NAME=my-cluster
# The region where we will create the cluster
export RD_REGION=us-west-2
# Kubernetes version to use
export RD_K8S_VERSION=1.27
# The name of the keypair that we're going to inject into the nodes. You
# must create this ahead of time in the correct region.
export RD_EC2_KEYPAIR_NAME=eks-my-cluster
Using eksctl, prepare an EKS cluster config:
eksctl create cluster \
--dry-run \
--name=${RD_CLUSTER_NAME} \
--nodegroup-name ng-infra \
--node-type m5.xlarge \
--nodes 2 \
--nodes-min 2 \
--nodes-max 2 \
--node-labels workload=infra \
--region=${RD_REGION} \
--ssh-access \
--ssh-public-key ${RD_EC2_KEYPAIR_NAME} \
--version ${RD_K8S_VERSION} \
--vpc-nat-mode HighlyAvailable \
--with-oidc \
> cluster.yaml
--dry-run
means the command will not actually create the cluster but will
instead output a config to stdout which we then write to cluster.yaml
.
Open the file and look at what it has produced.
For more info on the schema used by
cluster.yaml
, see the Config file schema page from eksctl.io
This cluster will start out with a node group that we will use to host our
“infra” services. This is why we are using the cheaper m5.xlarge
rather than
a baremetal instance type. However, we also need to ensure that none of our VMs
will ever be scheduled in these nodes. Thus we need to taint them. In the
generated cluster.yaml
file, append the following taint to the only node
group in the managedNodeGroups
list:
managedNodeGroups:
- amiFamily: AmazonLinux2
...
taints:
- key: CriticalAddonsOnly
effect: NoSchedule
We can now create the cluster:
eksctl create cluster --config-file cluster.yaml
Example output:
2023-08-20 07:59:14 [ℹ] eksctl version ...
2023-08-20 07:59:14 [ℹ] using region us-west-2 ...
2023-08-20 07:59:14 [ℹ] subnets for us-west-2a ...
2023-08-20 07:59:14 [ℹ] subnets for us-west-2b ...
2023-08-20 07:59:14 [ℹ] subnets for us-west-2c ...
...
2023-08-20 08:14:06 [ℹ] kubectl command should work with ...
2023-08-20 08:14:06 [✔] EKS cluster "my-cluster" in "us-west-2" is ready
Once the command is done, you should be able to query the the kube API. For example:
kubectl get nodes
Example output:
NAME STATUS ROLES AGE VERSION
ip-XXX.compute.internal Ready <none> 32m v1.27.4-eks-2d98532
ip-YYY.compute.internal Ready <none> 32m v1.27.4-eks-2d98532
As per this section of the Cluster Autoscaler docs:
If you’re using Persistent Volumes, your deployment needs to run in the same AZ as where the EBS volume is, otherwise the pod scheduling could fail if it is scheduled in a different AZ and cannot find the EBS volume. To overcome this, either use a single AZ ASG for this use case, or an ASG-per-AZ while enabling
--balance-similar-node-groups
.
Based on the above, we will create a node group for each of the availability
zones (AZs) that was declared in cluster.yaml
so that the Cluster Autoscaler will
always bring up a node in the AZ where a VM’s EBS-backed PV is located.
To do that, we will first prepare a template that we can then feed to
envsubst
. Save the following in node-group.yaml.template
:
---
# See: Config File Schema <https://eksctl.io/usage/schema/>
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: ${RD_CLUSTER_NAME}
region: ${RD_REGION}
managedNodeGroups:
- name: ng-${EKS_AZ}-c5-metal
amiFamily: AmazonLinux2
instanceType: c5.metal
availabilityZones:
- ${EKS_AZ}
desiredCapacity: 1
maxSize: 3
minSize: 0
labels:
alpha.eksctl.io/cluster-name: my-cluster
alpha.eksctl.io/nodegroup-name: ng-${EKS_AZ}-c5-metal
workload: vm
privateNetworking: false
ssh:
allow: true
publicKeyPath: ${RD_EC2_KEYPAIR_NAME}
volumeSize: 500
volumeIOPS: 10000
volumeThroughput: 750
volumeType: gp3
propagateASGTags: true
tags:
alpha.eksctl.io/nodegroup-name: ng-${EKS_AZ}-c5-metal
alpha.eksctl.io/nodegroup-type: managed
k8s.io/cluster-autoscaler/my-cluster: owned
k8s.io/cluster-autoscaler/enabled: "true"
# The following tags help CAS determine that this node group is able
# to satisfy the label and resource requirements of the KubeVirt VMs.
# See: https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/README.md#auto-discovery-setup
k8s.io/cluster-autoscaler/node-template/resources/devices.kubevirt.io/kvm: "1"
k8s.io/cluster-autoscaler/node-template/resources/devices.kubevirt.io/tun: "1"
k8s.io/cluster-autoscaler/node-template/resources/devices.kubevirt.io/vhost-net: "1"
k8s.io/cluster-autoscaler/node-template/resources/ephemeral-storage: 50M
k8s.io/cluster-autoscaler/node-template/label/kubevirt.io/schedulable: "true"
The last few tags bears additional emphasis. They are required because when a virtual machine is created, it will have the following requirements:
requests:
devices.kubevirt.io/kvm: 1
devices.kubevirt.io/tun: 1
devices.kubevirt.io/vhost-net: 1
ephemeral-storage: 50M
nodeSelectors: kubevirt.io/schedulable=true
However, at least when scaling from zero for the first time, CAS will have no knowledge of this information unless the correct AWS tags are added to the node group. This is why we have the following added to the managed node group’s tags:
k8s.io/cluster-autoscaler/node-template/resources/devices.kubevirt.io/kvm: "1"
k8s.io/cluster-autoscaler/node-template/resources/devices.kubevirt.io/tun: "1"
k8s.io/cluster-autoscaler/node-template/resources/devices.kubevirt.io/vhost-net: "1"
k8s.io/cluster-autoscaler/node-template/resources/ephemeral-storage: 50M
k8s.io/cluster-autoscaler/node-template/label/kubevirt.io/schedulable: "true"
For more information on these tags, see Auto-Discovery Setup.
We can now create the node group:
yq .availabilityZones[] cluster.yaml -r | \
xargs -I{} bash -c "
export EKS_AZ={};
envsubst < node-group.yaml.template | \
eksctl create nodegroup --config-file -
"
The following was adapted from KubeVirt quickstart with cloud providers.
Deploy the KubeVirt operator:
kubectl create -f https://github.com/kubevirt/kubevirt/releases/download/v1.0.0/kubevirt-operator.yaml
So that the operator will know how to deploy KubeVirt, let’s add the KubeVirt
resource:
cat <<EOF | kubectl apply -f -
apiVersion: kubevirt.io/v1
kind: KubeVirt
metadata:
name: kubevirt
namespace: kubevirt
spec:
certificateRotateStrategy: {}
configuration:
developerConfiguration:
featureGates: []
customizeComponents: {}
imagePullPolicy: IfNotPresent
workloadUpdateStrategy: {}
infra:
nodePlacement:
nodeSelector:
workload: infra
tolerations:
- key: CriticalAddonsOnly
operator: Exists
EOF
Notice how we are specifically configuring KubeVirt itself to tolerate the
CriticalAddonsOnly
taint. This is so that the KubeVirt services themselves can be scheduled in the infra nodes instead of the bare metal nodes which we want to scale down to zero when there are no VMs.
Wait until KubeVirt is in a Deployed
state:
kubectl get -n kubevirt -o=jsonpath="{.status.phase}" \
kubevirt.kubevirt.io/kubevirt
Example output:
Deployed
Double check that all KubeVirt components are healthy:
kubectl get pods -n kubevirt
Example output:
NAME READY STATUS RESTARTS AGE
pod/virt-api-674467958c-5chhj 1/1 Running 0 98d
pod/virt-api-674467958c-wzcmk 1/1 Running 0 5d
pod/virt-controller-6768977b-49wwb 1/1 Running 0 98d
pod/virt-controller-6768977b-6pfcm 1/1 Running 0 5d
pod/virt-handler-4hztq 1/1 Running 0 5d
pod/virt-handler-x98x5 1/1 Running 0 98d
pod/virt-operator-85f65df79b-lg8xb 1/1 Running 0 5d
pod/virt-operator-85f65df79b-rp8p5 1/1 Running 0 98d
The following is copied from kubevirt.io.
First create a secret from your public key:
kubectl create secret generic my-pub-key --from-file=key1=~/.ssh/id_rsa.pub
Next, create the VM:
# Create a VM referencing the Secret using propagation method configDrive
cat <<EOF | kubectl create -f -
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
name: testvm
spec:
running: true
template:
spec:
domain:
devices:
disks:
- disk:
bus: virtio
name: containerdisk
- disk:
bus: virtio
name: cloudinitdisk
rng: {}
resources:
requests:
memory: 1024M
terminationGracePeriodSeconds: 0
accessCredentials:
- sshPublicKey:
source:
secret:
secretName: my-pub-key
propagationMethod:
configDrive: {}
volumes:
- containerDisk:
image: quay.io/containerdisks/fedora:latest
name: containerdisk
- cloudInitConfigDrive:
userData: |-
#cloud-config
password: fedora
chpasswd: { expire: False }
name: cloudinitdisk
EOF
Check that the test VM is running:
kubectl get vm
Example output:
NAME AGE STATUS READY
testvm 30s Running True
Delete the VM:
kubectl delete testvm
So that CAS can set the desired capacity of each node group dynamically, we must grant it limited access to certain AWS resources. The first step to this is to define the IAM policy.
This section is based off of the “Create an IAM policy and role” section of the AWS Autoscaling documentation.
Prepare the policy document by rendering the following file.
cat > policy.json <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"autoscaling:SetDesiredCapacity",
"autoscaling:TerminateInstanceInAutoScalingGroup"
],
"Resource": "*"
},
{
"Sid": "VisualEditor1",
"Effect": "Allow",
"Action": [
"autoscaling:DescribeAutoScalingInstances",
"autoscaling:DescribeAutoScalingGroups",
"ec2:DescribeLaunchTemplateVersions",
"autoscaling:DescribeTags",
"autoscaling:DescribeLaunchConfigurations",
"ec2:DescribeInstanceTypes"
],
"Resource": "*"
}
]
}
EOF
The above should be enough for CAS to do its job. Next, create the policy:
aws iam create-policy \
--policy-name eks-${RD_REGION}-${RD_CLUSTER_NAME}-ClusterAutoscalerPolicy \
--policy-document file://policy.json
IMPORTANT: Take note of the returned policy ARN. You will need that below.
The Cluster Autoscaler needs a service account in the k8s cluster that’s
associated with an IAM role that consumes the policy document we created in the
previous section. This is normally a two-step process but can be created in a
single command using eksctl
:
For more information on what
eksctl
is doing under the covers, see How It Works from theeksctl
documentation for IAM Roles for Service Accounts.
export RD_POLICY_ARN="<Get this value from the last command's output>"
eksctl create iamserviceaccount \
--cluster=${RD_CLUSTER_NAME} \
--region=${RD_REGION} \
--namespace=kube-system \
--name=cluster-autoscaler \
--attach-policy-arn=${RD_POLICY_ARN} \
--override-existing-serviceaccounts \
--approve
Double check that the cluster-autoscaler
service account has been correctly
annotated with the IAM role that was created by eksctl
in the same step:
kubectl get sa cluster-autoscaler -n kube-system -ojson | \
jq -r '.metadata.annotations | ."eks.amazonaws.com/role-arn"'
Example output:
arn:aws:iam::365499461711:role/eksctl-my-cluster-addon-iamserviceaccount-...
Check from the AWS Console if the above role contains the policy that we created earlier.
First, find the most recent Cluster Autoscaler version that has the same MAJOR and MINOR version as the kubernetes cluster you’re deploying to.
Get the kube cluster’s version:
kubectl version -ojson | jq -r .serverVersion.gitVersion
Example output:
v1.27.4-eks-2d98532
Choose the appropriate version for CAS. You can get the latest Cluster Autoscaler versions from its Github Releases Page.
Example:
export CLUSTER_AUTOSCALER_VERSION=1.27.3
Next, deploy the cluster autoscaler using the deployment template that I prepared in the companion repo
envsubst < <(curl https://raw.githubusercontent.com/relaxdiego/kubevirt-cas-baremetal/main/cas-deployment.yaml.template) | \
kubectl apply -f -
Check the cluster autoscaler status:
kubectl get deploy,pod -l app=cluster-autoscaler -n kube-system
Example output:
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/cluster-autoscaler 1/1 1 1 4m1s
NAME READY STATUS RESTARTS AGE
pod/cluster-autoscaler-6c58bd6d89-v8wbn 1/1 Running 0 60s
Tail the cluster-autoscaler
pod’s logs to see what’s happening:
kubectl -n kube-system logs -f deployment.apps/cluster-autoscaler
Below are example log entries from Cluster Autoscaler terminating an unneeded node:
node ip-XXXX.YYYY.compute.internal may be removed
...
ip-XXXX.YYYY.compute.internal was unneeded for 1m3.743475455s
Once the timeout has been reached (default: 10 minutes), CAS will scale down the group:
Scale-down: removing empty node ip-XXXX.YYYY.compute.internal
Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"kube-system", ...
Successfully added ToBeDeletedTaint on node ip-XXXX.YYYY.compute.internal
Terminating EC2 instance: i-ZZZZ
DeleteInstances was called: ...
For more information on how Cluster Autoscaler scales down a node group, see How does scale-down work? from the project’s FAQ.
When you try to get the list of nodes, you should see the bare metal nodes tainted such that they are no longer schedulable:
NAME STATUS ROLES AGE VERSION
ip-XXXX Ready,SchedulingDisabled <none> 70m v1.27.3-eks-a5565ad
ip-XXXX Ready,SchedulingDisabled <none> 70m v1.27.3-eks-a5565ad
ip-XXXX Ready,SchedulingDisabled <none> 70m v1.27.3-eks-a5565ad
ip-XXXX Ready <none> 112m v1.27.3-eks-a5565ad
ip-XXXX Ready <none> 112m v1.27.3-eks-a5565ad
In a few more minutes, the nodes will be deleted.
To try the scale up, just deploy a VM.
Expanding Node Group eks-ng-eacf8ebb ...
Best option to resize: eks-ng-eacf8ebb
Estimated 1 nodes needed in eks-ng-eacf8ebb
Final scale-up plan: [{eks-ng-eacf8ebb 0->1 (max: 3)}]
Scale-up: setting group eks-ng-eacf8ebb size to 1
Setting asg eks-ng-eacf8ebb size to 1
At this point you should have a working, auto-scaling EKS cluster that can host VMs on bare metal nodes. If you have any questions, ask them here.
Infrastructure teams managing virtual machines (VMs) and the end users of these systems make use of a variety of tools as part of their day-to-day world. One such tool that is shared amongst these two groups is Ansible, an agentless automation tool for the enterprise. To simplify both the adoption and usage of KubeVirt as well as to integrate seamlessly into existing workflows, the KubeVirt community is excited to introduce the release of the first version of the KubeVirt collection for Ansible, called kubevirt.core
, which includes a number of tools that you do not want to miss.
This article will review some of the features and their use associated with this initial release.
Note: There is also a video version of this blog, which can be found on the KubeVirt YouTube channel.
Before diving into the featureset of the collection itself, let’s review why the collection was created in the first place.
While adopting KubeVirt and Kubernetes has the potential to disrupt the workflows of teams that typically manage VM infrastructure, including the end users themselves, many of the same paradigms remain:
For these reasons and more, it is only natural that a tool, like Ansible, is introduced within the KubeVirt community. Not only can it help manage KubeVirt and Kubernetes resources, like VirtualMachines
, but also to enable the extensive Ansible ecosystem for managing guest configurations.
As part of the initial release, an Ansible Inventory plugin and management module is included. They are available in the same distribution location containing Ansible automation content, Ansible Galaxy. The resources encompassing the collection itself are detailed in the following sections.
To work with KubeVirt VMs in Ansible, they need to be available in Ansible’s hosts inventory. Since KubeVirt is already using the Kubernetes API to manage VMs, it would be nice to leverage this API to discover hosts with Ansible too. This is where the dynamic inventory of the kubevirt.core
collection comes into play.
The dynamic inventory capability allows you to query the Kubernetes API for available VMs in a given namespace or namespaces, along with additional filtering options, such as labels. To allow Ansible to find the right connection parameters for a VM, the network name of a secondary interface can also be specified.
Under the hood, the dynamic inventory uses either your default kubectl credentials or credentials specified in the inventory parameters to establish the connection with a cluster.
While working with existing VMs is already quite useful, it would be even better to control the entire lifecycle of KubeVirt VirtualMachines
from Ansible. This is made possible by the kubevirt_vm
module provided by the kubevirt.core
collection.
The kubevirt_vm
module is a thin wrapper around the kubernetes.core.k8s module and it allows you to control the essential fields of a KubeVirt VirtualMachine
’s specification. In true Ansible fashion, this module tries to be as idempotent as possible and only makes changes to objects within Kubernetes if necessary. With its wait
feature, it is possible to delay further tasks until a VM was successfully created or updated and the VM is in the ready state or was successfully deleted.
Now that we’ve provided an introduction to the featureset, it is time to illustrate how you can get up to speed using the collection including a few examples to showcase the capabilities provided by the collection.
Please note that as a prerequisite, Ansible needs to be installed and configured along with a working Kubernetes cluster with KubeVirt and the KubeVirt Cluster Network Addons Operator. The cluster also needs to have a secondary network configured, which can be attached to VMs so that the machine can be reached from the Ansible control node.
First, install the kubevirt.core
collection from Ansible Galaxy:
ansible-galaxy collection install kubevirt.core
This will also install the kubernetes.core
collection as a dependency.
Second, create a new Namespace and a Secret containing a public key for SSH authentication:
ssh-keygen -f my-key
kubectl create namespace kubevirt-ansible
kubectl create secret generic my-pub-key --from-file=key1=my-key.pub -n kubevirt-ansible
With the collection now installed and the public key pair created, create a file called play-create.yml
containing an Ansible playbook to deploy a new VM called testvm
:
- hosts: localhost
connection: local
tasks:
- name: Create VM
kubevirt.core.kubevirt_vm:
state: present
name: testvm
namespace: kubevirt-ansible
labels:
app: test
instancetype:
name: u1.medium
preference:
name: fedora
spec:
domain:
devices:
interfaces:
- name: default
masquerade: {}
- name: secondary-network
bridge: {}
networks:
- name: default
pod: {}
- name: secondary-network
multus:
networkName: secondary-network
accessCredentials:
- sshPublicKey:
source:
secret:
secretName: my-pub-key
propagationMethod:
configDrive: {}
volumes:
- containerDisk:
image: quay.io/containerdisks/fedora:latest
name: containerdisk
- cloudInitConfigDrive:
userData: |-
#cloud-config
# The default username is: fedora
name: cloudinit
wait: yes
Run the playbook by executing the following command:
ansible-playbook play-create.yml
Once the playbook completes successfully, the defined VM will be running in the kubevirt-ansible
namespace, which can be confirmed by querying for VirtualMachines
in this namespace:
kubectl get VirtualMachine -n kubevirt-ansible
With the VM deployed, it is eligible for use in Ansible automation activities. Let’s illustrate how it can be queried and added to an Ansible inventory dynamically using the plugin provided by the kubevirt.core
collection.
Create a file called inventory.kubevirt.yml
containing the following content:
plugin: kubevirt.core.kubevirt
connections:
- namespaces:
- kubevirt-ansible
network_name: secondary-network
label_selector: app=test
Use the ansible-inventory
command to confirm the VM becomes added to the Ansible inventory:
ansible-inventory -i inventory.kubevirt.yml --list
Next, make use of the host by querying for all of the facts exposed by the VM using the setup module:
ansible -i inventory.kubevirt.yml -u fedora --key-file my-key all -m setup
Complete the lifecycle of the VM by destroying the previously created VirtualMachine
and Namespace
. Create a file called play-delete.yml
containing the following playbook:
- hosts: localhost
tasks:
- name: Delete VM
kubevirt.core.kubevirt_vm:
name: testvm
namespace: kubevirt-ansible
state: absent
wait: yes
- name: Delete namespace
kubernetes.core.k8s:
name: kubevirt-ansible
api_version: v1
kind: Namespace
state: absent
Run the playbook to remove the VM:
ansible-playbook play-delete.yml
More information including the full list of parameters and options can be found within the collection documentation:
https://kubevirt.io/kubevirt.core
This has been a brief introduction to the concepts and usage of the newly released kubevirt.core
collection. Nevertheless, we hope that it helped to showcase the integration now available between KubeVirt and Ansible, including how easy it is to manage KubeVirt assets. A next potential iteration could be to expose a VM via a Kubernetes Service
using one of the methods described in this article instead of a secondary interface as was covered in this walkthrough. Not only does it leverage existing models outside the KubeVirt ecosystem, but it helps to enable a uniform method for exposing content.
Interested in learning more, providing feedback or contributing? Head over to the kubevirt.core
GitHub repository to continue your journey and get involved.
Kubernetes NetworkPolicies are constructs to control traffic flow at the IP address or port level (OSI layers 3 or 4). They allow the user to specify how a pod (or group of pods) is allowed to communicate with other entities on the network. In simpler words: the user can specify ingress from or egress to other workloads, using L3 / L4 semantics.
Keeping in mind NetworkPolicy
is a Kubernetes construct - which only cares
about a single network interface - they are only usable for the cluster’s
default network interface. This leaves a considerable gap for Virtual Machine
users, since they are heavily invested in secondary networks.
The k8snetworkplumbingwg has addressed this limitation by providing a
MultiNetworkPolicy
CRD - it features the exact same API as NetworkPolicy
but can target network-attachment-definitions.
OVN-Kubernetes implements this API, and configures access control accordingly
for secondary networks in the cluster.
In this post we will see how we can govern access control for VMs using the multi-network policy API. On our simple example, we’ll only allow into our VMs for traffic ingressing from a particular CIDR range.
MultiNetworkPolicies
for VMsKubernetes NetworkPolicy
has three types of policy peers:
While MultiNetworkPolicy
allows these three types, when used with VMs we
recommend using only the IPBlock
policy peer - both namespace
and pod
selectors prevent the live-migration of Virtual Machines (these policy peers
require OVN-K managed IPAM, and currently the live-migration feature is only
available when IPAM is not enabled on the interfaces).
To run this demo, we will prepare a Kubernetes cluster with the following components installed:
The following section will show you how to create a KinD cluster, with upstream latest OVN-Kubernetes, upstream latest multus-cni, and the multi-network policy CRDs deployed.
Refer to the OVN-Kubernetes repo KIND documentation for more details; the gist of it is you should clone the OVN-Kubernetes repository, and run their kind helper script:
git clone git@github.com:ovn-org/ovn-kubernetes.git
cd ovn-kubernetes
pushd contrib ; ./kind.sh --multi-network-enable ; popd
This will get you a running kind cluster (one control plane, and two worker
nodes), configured to use OVN-Kubernetes as the default cluster network,
configuring the multi-homing OVN-Kubernetes feature gate, and deploying
multus-cni
in the cluster.
Follow Kubevirt’s user guide to install the latest released version (currently, v1.0.0).
export RELEASE=$(curl https://storage.googleapis.com/kubevirt-prow/release/kubevirt/kubevirt/stable.txt)
kubectl apply -f "https://github.com/kubevirt/kubevirt/releases/download/${RELEASE}/kubevirt-operator.yaml"
kubectl apply -f "https://github.com/kubevirt/kubevirt/releases/download/${RELEASE}/kubevirt-cr.yaml"
kubectl -n kubevirt wait kv kubevirt --timeout=360s --for condition=Available
Now we have a Kubernetes cluster with all the pieces to start the Demo.
In this example, we will configure a MultiNetworkPolicy
allowing ingress into
our VMs only from a particular CIDR range - let’s say 10.200.0.0/30
.
Provision the following NAD (to allow our VMs to live-migrate, we do not define
a subnet
):
---
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
name: flatl2net
spec:
config: |2
{
"cniVersion": "0.4.0",
"name": "flatl2net",
"type": "ovn-k8s-cni-overlay",
"topology":"layer2",
"netAttachDefName": "default/flatl2net"
}
Let’s now provision our six VMs, with the following name to IP address (statically configured via cloud-init) association:
---
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
labels:
kubevirt.io/vm: vm1
name: vm1
spec:
running: true
template:
metadata:
labels:
name: access-control
kubevirt.io/domain: vm1
kubevirt.io/vm: vm1
spec:
domain:
devices:
disks:
- disk:
bus: virtio
name: containerdisk
- disk:
bus: virtio
name: cloudinitdisk
interfaces:
- bridge: {}
name: flatl2-overlay
rng: {}
resources:
requests:
memory: 1024Mi
networks:
- multus:
networkName: flatl2net
name: flatl2-overlay
termination/GracePeriodSeconds: 30
volumes:
- containerDisk:
image: quay.io/kubevirt/fedora-with-test-tooling-container-disk:v1.0.0
name: containerdisk
- cloudInitNoCloud:
networkData: |
ethernets:
eth0:
addresses:
- 10.200.0.1/24
version: 2
userData: |-
#cloud-config
user: fedora
password: password
chpasswd: { expire: False }
name: cloudinitdisk
---
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
labels:
kubevirt.io/vm: vm2
name: vm2
spec:
running: true
template:
metadata:
labels:
name: access-control
kubevirt.io/domain: vm2
kubevirt.io/vm: vm2
spec:
domain:
devices:
disks:
- disk:
bus: virtio
name: containerdisk
- disk:
bus: virtio
name: cloudinitdisk
interfaces:
- bridge: {}
name: flatl2-overlay
rng: {}
resources:
requests:
memory: 1024Mi
networks:
- multus:
networkName: flatl2net
name: flatl2-overlay
termination/GracePeriodSeconds: 30
volumes:
- containerDisk:
image: quay.io/kubevirt/fedora-with-test-tooling-container-disk:v1.0.0
name: containerdisk
- cloudInitNoCloud:
networkData: |
ethernets:
eth0:
addresses:
- 10.200.0.2/24
version: 2
userData: |-
#cloud-config
user: fedora
password: password
chpasswd: { expire: False }
name: cloudinitdisk
---
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
labels:
kubevirt.io/vm: vm3
name: vm3
spec:
running: true
template:
metadata:
labels:
name: access-control
kubevirt.io/domain: vm3
kubevirt.io/vm: vm3
spec:
domain:
devices:
disks:
- disk:
bus: virtio
name: containerdisk
- disk:
bus: virtio
name: cloudinitdisk
interfaces:
- bridge: {}
name: flatl2-overlay
rng: {}
resources:
requests:
memory: 1024Mi
networks:
- multus:
networkName: flatl2net
name: flatl2-overlay
termination/GracePeriodSeconds: 30
volumes:
- containerDisk:
image: quay.io/kubevirt/fedora-with-test-tooling-container-disk:v1.0.0
name: containerdisk
- cloudInitNoCloud:
networkData: |
ethernets:
eth0:
addresses:
- 10.200.0.3/24
version: 2
userData: |-
#cloud-config
user: fedora
password: password
chpasswd: { expire: False }
name: cloudinitdisk
---
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
labels:
kubevirt.io/vm: vm4
name: vm4
spec:
running: true
template:
metadata:
labels:
name: access-control
kubevirt.io/domain: vm4
kubevirt.io/vm: vm4
spec:
domain:
devices:
disks:
- disk:
bus: virtio
name: containerdisk
- disk:
bus: virtio
name: cloudinitdisk
interfaces:
- bridge: {}
name: flatl2-overlay
rng: {}
resources:
requests:
memory: 1024Mi
networks:
- multus:
networkName: flatl2net
name: flatl2-overlay
termination/GracePeriodSeconds: 30
volumes:
- containerDisk:
image: quay.io/kubevirt/fedora-with-test-tooling-container-disk:v1.0.0
name: containerdisk
- cloudInitNoCloud:
networkData: |
ethernets:
eth0:
addresses:
- 10.200.0.4/24
version: 2
userData: |-
#cloud-config
user: fedora
password: password
chpasswd: { expire: False }
name: cloudinitdisk
---
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
labels:
kubevirt.io/vm: vm5
name: vm5
spec:
running: true
template:
metadata:
labels:
name: access-control
kubevirt.io/domain: vm5
kubevirt.io/vm: vm5
spec:
domain:
devices:
disks:
- disk:
bus: virtio
name: containerdisk
- disk:
bus: virtio
name: cloudinitdisk
interfaces:
- bridge: {}
name: flatl2-overlay
rng: {}
resources:
requests:
memory: 1024Mi
networks:
- multus:
networkName: flatl2net
name: flatl2-overlay
termination/GracePeriodSeconds: 30
volumes:
- containerDisk:
image: quay.io/kubevirt/fedora-with-test-tooling-container-disk:v1.0.0
name: containerdisk
- cloudInitNoCloud:
networkData: |
ethernets:
eth0:
addresses:
- 10.200.0.5/24
version: 2
userData: |-
#cloud-config
user: fedora
password: password
chpasswd: { expire: False }
name: cloudinitdisk
---
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
labels:
kubevirt.io/vm: vm6
name: vm6
spec:
running: true
template:
metadata:
labels:
name: access-control
kubevirt.io/domain: vm6
kubevirt.io/vm: vm6
spec:
domain:
devices:
disks:
- disk:
bus: virtio
name: containerdisk
- disk:
bus: virtio
name: cloudinitdisk
interfaces:
- bridge: {}
name: flatl2-overlay
rng: {}
resources:
requests:
memory: 1024Mi
networks:
- multus:
networkName: flatl2net
name: flatl2-overlay
termination/GracePeriodSeconds: 30
volumes:
- containerDisk:
image: quay.io/kubevirt/fedora-with-test-tooling-container-disk:v1.0.0
name: containerdisk
- cloudInitNoCloud:
networkData: |
ethernets:
eth0:
addresses:
- 10.200.0.6/24
version: 2
userData: |-
#cloud-config
user: fedora
password: password
chpasswd: { expire: False }
name: cloudinitdisk
NOTE: it is important to highlight all the Virtual Machines (and the
network-attachment-definition
) are defined in the default
namespace.
After this step, we should have the following deployment:
Let’s check the VMs vm1
and vm4
can ping their peers in the same subnet.
For that we will
connect to the VMs over their serial console:
First, let’s check vm1
:
➜ virtctl console vm1
Successfully connected to vm1 console. The escape sequence is ^]
[fedora@vm1 ~]$ ping 10.200.0.2 -c 4
PING 10.200.0.2 (10.200.0.2) 56(84) bytes of data.
64 bytes from 10.200.0.2: icmp_seq=1 ttl=64 time=5.16 ms
64 bytes from 10.200.0.2: icmp_seq=2 ttl=64 time=1.41 ms
64 bytes from 10.200.0.2: icmp_seq=3 ttl=64 time=34.2 ms
64 bytes from 10.200.0.2: icmp_seq=4 ttl=64 time=2.56 ms
--- 10.200.0.2 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3005ms
rtt min/avg/max/mdev = 1.406/10.841/34.239/13.577 ms
[fedora@vm1 ~]$ ping 10.200.0.6 -c 4
PING 10.200.0.6 (10.200.0.6) 56(84) bytes of data.
64 bytes from 10.200.0.6: icmp_seq=1 ttl=64 time=3.77 ms
64 bytes from 10.200.0.6: icmp_seq=2 ttl=64 time=1.46 ms
64 bytes from 10.200.0.6: icmp_seq=3 ttl=64 time=5.47 ms
64 bytes from 10.200.0.6: icmp_seq=4 ttl=64 time=1.74 ms
--- 10.200.0.6 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3007ms
rtt min/avg/max/mdev = 1.459/3.109/5.469/1.627 ms
[fedora@vm1 ~]$
And from vm4:
➜ ~ virtctl console vm4
Successfully connected to vm4 console. The escape sequence is ^]
[fedora@vm4 ~]$ ping 10.200.0.1 -c 4
PING 10.200.0.1 (10.200.0.1) 56(84) bytes of data.
64 bytes from 10.200.0.1: icmp_seq=1 ttl=64 time=3.20 ms
64 bytes from 10.200.0.1: icmp_seq=2 ttl=64 time=1.62 ms
64 bytes from 10.200.0.1: icmp_seq=3 ttl=64 time=1.44 ms
64 bytes from 10.200.0.1: icmp_seq=4 ttl=64 time=0.951 ms
--- 10.200.0.1 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3006ms
rtt min/avg/max/mdev = 0.951/1.803/3.201/0.843 ms
[fedora@vm4 ~]$ ping 10.200.0.6 -c 4
PING 10.200.0.6 (10.200.0.6) 56(84) bytes of data.
64 bytes from 10.200.0.6: icmp_seq=1 ttl=64 time=1.85 ms
64 bytes from 10.200.0.6: icmp_seq=2 ttl=64 time=1.02 ms
64 bytes from 10.200.0.6: icmp_seq=3 ttl=64 time=1.27 ms
64 bytes from 10.200.0.6: icmp_seq=4 ttl=64 time=0.970 ms
--- 10.200.0.6 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3005ms
rtt min/avg/max/mdev = 0.970/1.275/1.850/0.350 ms
We will now provision a MultiNetworkPolicy
applying to all the VMs defined
above. To do this mapping correcly, the policy has to:
k8s.v1.cni.cncf.io/policy-for
annotation matching the secondary
network used by the VM.matchLabels
selector matching the labels set on VM’s
spec.template.metadata
.This policy will allow ingress into these access-control
labeled pods
only if the traffic originates from within the 10.200.0.0/30
CIDR range
(IPs 10.200.0.1-3).
---
apiVersion: k8s.cni.cncf.io/v1beta1
kind: MultiNetworkPolicy
metadata:
name: ingress-ipblock
annotations:
k8s.v1.cni.cncf.io/policy-for: default/flatl2net
spec:
podSelector:
matchLabels:
name: access-control
policyTypes:
- Ingress
ingress:
- from:
- ipBlock:
cidr: 10.200.0.0/30
Taking into account our example, only
vm1
, vm2
, and vm3
will be able to contact any of its peers, as pictured
by the following diagram:
Let’s try again the ping after provisioning the MultiNetworkPolicy
object:
From vm1
(inside the allowed ip block range):
[fedora@vm1 ~]$ ping 10.200.0.2 -c 4
PING 10.200.0.2 (10.200.0.2) 56(84) bytes of data.
64 bytes from 10.200.0.2: icmp_seq=1 ttl=64 time=6.48 ms
64 bytes from 10.200.0.2: icmp_seq=2 ttl=64 time=4.40 ms
64 bytes from 10.200.0.2: icmp_seq=3 ttl=64 time=1.28 ms
64 bytes from 10.200.0.2: icmp_seq=4 ttl=64 time=1.51 ms
--- 10.200.0.2 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3006ms
rtt min/avg/max/mdev = 1.283/3.418/6.483/2.154 ms
[fedora@vm1 ~]$ ping 10.200.0.6 -c 4
PING 10.200.0.6 (10.200.0.6) 56(84) bytes of data.
64 bytes from 10.200.0.6: icmp_seq=1 ttl=64 time=3.81 ms
64 bytes from 10.200.0.6: icmp_seq=2 ttl=64 time=2.67 ms
64 bytes from 10.200.0.6: icmp_seq=3 ttl=64 time=1.68 ms
64 bytes from 10.200.0.6: icmp_seq=4 ttl=64 time=1.63 ms
--- 10.200.0.6 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3006ms
rtt min/avg/max/mdev = 1.630/2.446/3.808/0.888 ms
From vm4
(outside the allowed ip block range):
[fedora@vm4 ~]$ ping 10.200.0.1 -c 4
PING 10.200.0.1 (10.200.0.1) 56(84) bytes of data.
--- 10.200.0.1 ping statistics ---
4 packets transmitted, 0 received, 100% packet loss, time 3083ms
[fedora@vm4 ~]$ ping 10.200.0.6 -c 4
PING 10.200.0.6 (10.200.0.6) 56(84) bytes of data.
--- 10.200.0.6 ping statistics ---
4 packets transmitted, 0 received, 100% packet loss, time 3089ms
In this post we’ve shown how MultiNetworkPolicies
can be used to provide
access control to VMs with secondary network interfaces.
We have provided a comprehensive example on how a policy can be used to limit ingress to our VMs only from desired sources, based on the client’s IP address.
]]>The KubeVirt project started in Red Hat at the end of 2016, with the question: Can virtual machines (VMs) run in containers and be deployed by Kubernetes? It proved to be not only possible, but quickly emerged as a promising solution to the future of virtual machines in the container age. KubeVirt joined the CNCF as a Sandbox project in September 2019, and an Incubating project in April 2022. From a handful of people hacking away on a proof of concept, KubeVirt has grown into 45 active repositories, with the primary kubevirt/kubevirt repo having 17k commits and 1k forks.
The v1.0 release signifies the incredible growth that the community has gone through in the past six years from an idea to a production-ready Virtual Machine Management solution. The next stage with v1.0 is the additional focus on maintaining APIs while continuing to grow the project. This has led KubeVirt to adopt community practices from Kubernetes in key parts of the project.
Leading up to this release we had a shift in release cadence: from monthly to 3 times a year, following the Kubernetes release model. This allows our developer community additional time to ensure stability and compatibility, our users more time to plan and comfortably upgrade, and also aligns our releases with Kubernetes to simplify maintenance and supportability.
The theme ‘aligning with Kubernetes’ is also felt through the other parts of the community, by following their governance processes; introducing SIGs to split test and review responsibilities, as well as a SIG release repo to handle everything related to a release; and regular SIG meetings that now include SIG scale and performance and SIG storage alongside our weekly Community meetings.
This release demonstrates the accomplishments of the community and user adoption over the past many months. The full list of feature and bug fixes can be found in our release notes, but we’ve also asked representatives from some of our SIGs for a summary.
KubeVirt’s SIG-scale drives the performance and scalability initiatives in the community. Our focus for the v1.0 release was on sharing the performance results over the past 6 months. The benchmarks since December 2022 which cover the past two release - v0.59 (Mar 2023) and v1.0 (July 2023) are as follows:
Performance benchmarks for v1.0 release
Scalability benchmarks for v1.0 release
Publishing these measurements provides the community and end-users visibility into the performance and scalability over multiple releases. In addition, these results help identify the effects of code changes so that community members can diagnose performance problems and regressions.
End-users can use the same tools and techniques SIG-scale uses to analyze performance and scalability in their own deployments. Since performance and scalability are mostly relative to the deployment stack, the same strategies should be used to further contextualize the community’s measurements.
SIG-storage is focused on providing persistent storage to KubeVirt VMs and managing that storage throughout the lifecycle of the VM. This begins with provisioning and populating PVCs with bootable images but also includes features such as disk hotplug, snapshots, backup and restore, disaster recovery, and virtual machine export.
For v1.0, SIG-storage delivered the following features: providing a flexible VM export API, enabling persistent SCSI reservation, provisioning VMs from a retained snapshot, and setting out-of-the-box defaults for additional storage provisioners. Another major effort was to implement Volume Populator alternatives to the KubeVirt DataVolume API in order to better leverage platform capabilities. The SIG meets every 2 weeks and welcomes anyone to join us for interesting storage discussions.
SIG-compute is focused on the core virtualization functionality of KubeVirt, but also encompasses features that don’t fit well into another SIG. Some examples of SIG-compute’s scope include the lifecycle of VMs, migration, as well as maintenance of the core API.
For v1.0, SIG-compute developed features for memory over-commit. This includes initial support for KSM and FreePageReporting. We added support for persistent vTPM, which makes it much easier to use BitLocker on Windows installs. Additionally, there’s now an initial implementation for CPU Hotplug (currently hidden behind a feature gate).
SIG-network is committed to enhancing and maintaining all aspects of Virtual Machine network connectivity and management in KubeVirt.
For the v1.0 release, we have introduced hot plug and hot unplug (as alpha), which enables users to add and remove VM secondary network interfaces that use bridge binding on a running VM. Hot plug API stabilization and support for SR-IOV interfaces is under development for the next minor release.
The effort to simplify the VirtualMachine UX is still ongoing and with the v1.0 release we were able to introduce the v1beta1 version of the instancetype.kubevirt.io API. In the future KubeVirt v1.1.0 release we are aiming to finally graduate the instancetype.kubevirt.io API to v1.
With the new version it is now possible to control the memory overcommit of virtual machines as a percentage within instance types. Resource requirements were added to preferences, which allows users to ensure that requirements of a workload are met. Also several new preference attributes have been added to cover more use cases.
Moreover, virtctl was extended to make use of the new instance type and preference features.
From a development perspective, we will continue to introduce and improve features that make life easier for virtualization users in a manner that is as native to Kubernetes as possible. From a community perspective, we are improving our new contributor experience so that we can continue to grow and help new members learn and be a part of the cloud native ecosystem. In addition, with this milestone we can now shift our attention on becoming a CNCF Graduated project.
]]>Released on: Thu Jul 6 17:39:42 2023 +0000
KUBEVIRT_RELEASE
env var is truekubevirt.io/interface
resource name API for reserving domain resources for network interfaces.virtctl image-upload
command allowing users to associate a default instance type and/or preference with an image during upload. --default-instancetype
, --default-instancetype-kind
, --default-preference
and --default-preference-kind
. See the user-guide documentation for more details on using the uploaded image with the inferFromVolume
feature during VirtualMachine
creation.v1beta1
version of the instancetype.kubevirt.io
API and CRDs has been introduced.kubevirt_vmi_memory_cached_bytes
metricVirtualMachineOptions
to specify virtual machine behavior at cluster levelv1.0.0
release of KubeVirt the storage version of all core kubevirt.io
APIs will be moving to version v1
. To accommodate the eventual removal of the v1alpha3
version with KubeVirt >=v1.2.0
it is recommended that operators deploy the kube-storage-version-migrator
tool within their environment. This will ensure any existing v1alpha3
stored objects are migrated to v1
well in advance of the removal of the underlying v1alpha3
version.kubevirt.io/interface
resource name to reserve domain resources for network interfaces.kubevirt.io/ksm-enabled
labelkubevirt.io/v1
apiVersion
is now the default storage version for newly created objectsRUNBOOK_URL_TEMPLATE
for the runbooks URL templateName
of a {Instancetype,Preference}Matcher
without also updating the RevisionName
are now rejected.podConfigDone
field in favor of a new source option in infoSource
.dedicatedCPUPlacement
attribute is once again supported within the VirtualMachineInstancetype
and VirtualMachineClusterInstancetype
CRDs after a recent bugfix improved VirtualMachine
validations, ensuring defaults are applied before any attempt to validate.OVN (Open Virtual Network) is a series of daemons for the Open vSwitch that translate virtual network configurations into OpenFlow. It provides virtual networking capabilities for any type of workload on a virtualized platform (virtual machines and containers) using the same API.
OVN provides a higher-layer of abstraction than Open vSwitch, working with logical routers and logical switches, rather than flows. More details can be found in the OVN architecture man page.
In this post we will repeat the scenario of its bridge CNI equivalent, using this SDN approach. This secondary network topology is akin to the one described in the flatL2 topology, but allows connectivity to the physical underlay.
To run this demo, we will prepare a Kubernetes cluster with the following components installed:
The following section will show you how to create a KinD cluster, with upstream latest OVN-Kubernetes, and upstream latest multus-cni deployed.
Refer to the OVN-Kubernetes repo KIND documentation for more details; the gist of it is you should clone the OVN-Kubernetes repository, and run their kind helper script:
git clone git@github.com:ovn-org/ovn-kubernetes.git
cd ovn-kubernetes
pushd contrib ; ./kind.sh --multi-network-enable ; popd
This will get you a running kind cluster, configured to use OVN-Kubernetes as the default cluster network, configuring the multi-homing OVN-Kubernetes feature gate, and deploying multus-cni in the cluster.
Follow Kubevirt’s user guide to install the latest released version (currently, v0.59.0).
export RELEASE=$(curl https://storage.googleapis.com/kubevirt-prow/release/kubevirt/kubevirt/stable.txt)
kubectl apply -f "https://github.com/kubevirt/kubevirt/releases/download/${RELEASE}/kubevirt-operator.yaml"
kubectl apply -f "https://github.com/kubevirt/kubevirt/releases/download/${RELEASE}/kubevirt-cr.yaml"
kubectl -n kubevirt wait kv kubevirt --timeout=360s --for condition=Available
Now we have a Kubernetes cluster with all the pieces to start the Demo.
In this scenario we will see how traffic from a single localnet network can be connected to a physical network in the host using a dedicated bridge.
This scenario does not use any VLAN encapsulation, thus is simpler, since the network admin does not need to provision any VLANs in advance.
When you’ve started the KinD cluster with the --multi-network-enable
flag an
additional OCI network was created, and attached to each of the KinD nodes.
But still, further steps may be required, depending on the desired L2 configuration.
Let’s first create a dedicated OVS bridge, and attach the aforementioned virtualized network to it:
for node in $(kubectl -n ovn-kubernetes get pods -l app=ovs-node -o jsonpath="{.items[*].metadata.name}")
do
kubectl -n ovn-kubernetes exec -ti $node -- ovs-vsctl --may-exist add-br ovsbr1
kubectl -n ovn-kubernetes exec -ti $node -- ovs-vsctl --may-exist add-port ovsbr1 eth1
kubectl -n ovn-kubernetes exec -ti $node -- ovs-vsctl set open . external_ids:ovn-bridge-mappings=physnet:breth0,localnet-network:ovsbr1
done
The first two commands are self-evident: you create an OVS bridge, and attach
a port to it; the last one is not. In it, we’re using the
OVN bridge mapping
API to configure which OVS bridge must be used for each physical network.
It creates a patch port between the OVN integration bridge - br-int
- and the
OVS bridge you tell it to, and traffic will be forwarded to/from it with the
help of a
localnet port.
NOTE: The provided mapping must match the name
within the
net-attach-def
.Spec.Config JSON, otherwise, the patch ports will not be
created.
You will also have to configure an IP address on the bridge for the extra-network the kind script created. For that, you first need to identify the bridge’s name. In the example below we’re providing a command for the podman runtime:
podman network inspect underlay --format '{{ .NetworkInterface }}'
podman3
ip addr add 10.128.0.1/24 dev podman3
NOTE: for docker, please use the following command:
ip a | grep `docker network inspect underlay --format '{{ index .IPAM.Config 0 "Gateway" }}'` | awk '{print $NF}'
br-0aeb0318f71f
ip addr add 10.128.0.1/24 dev br-0aeb0318f71f
Let’s also use an IP in the same subnet as the network subnet (defined in the NAD). This IP address must be excluded from the IPAM pool (also on the NAD), otherwise the OVN-Kubernetes IPAM may assign it to a workload.
Once the underlay is configured, we can now provision the attachment configuration:
---
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
name: localnet-network
spec:
config: |2
{
"cniVersion": "0.3.1",
"name": "localnet-network",
"type": "ovn-k8s-cni-overlay",
"topology": "localnet",
"subnets": "10.128.0.0/24",
"excludeSubnets": "10.128.0.1/32",
"netAttachDefName": "default/localnet-network"
}
It is required to list the gateway IP in the excludedSubnets
attribute, thus
preventing OVN-Kubernetes from assigning that IP address to the workloads.
These two VMs can be used for the single broadcast domain scenario (no VLANs).
---
apiVersion: kubevirt.io/v1alpha3
kind: VirtualMachine
metadata:
name: vm-server
spec:
running: true
template:
spec:
nodeSelector:
kubernetes.io/hostname: ovn-worker
domain:
devices:
disks:
- name: containerdisk
disk:
bus: virtio
- name: cloudinitdisk
disk:
bus: virtio
interfaces:
- name: localnet
bridge: {}
machine:
type: ""
resources:
requests:
memory: 1024M
networks:
- name: localnet
multus:
networkName: localnet-network
terminationGracePeriodSeconds: 0
volumes:
- name: containerdisk
containerDisk:
image: quay.io/kubevirt/fedora-with-test-tooling-container-disk:devel
- name: cloudinitdisk
cloudInitNoCloud:
networkData: |
version: 2
ethernets:
eth0:
dhcp4: true
userData: |-
#cloud-config
password: fedora
chpasswd: { expire: False }
---
apiVersion: kubevirt.io/v1alpha3
kind: VirtualMachine
metadata:
name: vm-client
spec:
running: true
template:
spec:
nodeSelector:
kubernetes.io/hostname: ovn-worker2
domain:
devices:
disks:
- name: containerdisk
disk:
bus: virtio
- name: cloudinitdisk
disk:
bus: virtio
interfaces:
- name: localnet
bridge: {}
machine:
type: ""
resources:
requests:
memory: 1024M
networks:
- name: localnet
multus:
networkName: localnet-network
terminationGracePeriodSeconds: 0
volumes:
- name: containerdisk
containerDisk:
image: quay.io/kubevirt/fedora-with-test-tooling-container-disk:devel
- name: cloudinitdisk
cloudInitNoCloud:
networkData: |
version: 2
ethernets:
eth0:
dhcp4: true
userData: |-
#cloud-config
password: fedora
chpasswd: { expire: False }
You can check east/west connectivity between both VMs via ICMP:
$ kubectl get vmi vm-server -ojsonpath="{ @.status.interfaces }" | jq
[
{
"infoSource": "domain, guest-agent, multus-status",
"interfaceName": "eth0",
"ipAddress": "10.128.0.2",
"ipAddresses": [
"10.128.0.2",
"fe80::e83d:16ff:fe76:c1bd"
],
"mac": "ea:3d:16:76:c1:bd",
"name": "localnet",
"queueCount": 1
}
]
$ virtctl console vm-client
Successfully connected to vm-client console. The escape sequence is ^]
[fedora@vm-client ~]$ ping 10.128.0.2
PING 10.128.0.2 (10.128.0.2) 56(84) bytes of data.
64 bytes from 10.128.0.2: icmp_seq=1 ttl=64 time=0.808 ms
64 bytes from 10.128.0.2: icmp_seq=2 ttl=64 time=0.478 ms
64 bytes from 10.128.0.2: icmp_seq=3 ttl=64 time=0.536 ms
64 bytes from 10.128.0.2: icmp_seq=4 ttl=64 time=0.507 ms
--- 10.128.0.2 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3005ms
rtt min/avg/max/mdev = 0.478/0.582/0.808/0.131 ms
We can now start HTTP servers listening to the IPs attached on the gateway:
python3 -m http.server --bind 10.128.0.1 9000
And finally curl this from your client:
[fedora@vm-client ~]$ curl -v 10.128.0.1:9000
* Trying 10.128.0.1:9000...
* Connected to 10.128.0.1 (10.128.0.1) port 9000 (#0)
> GET / HTTP/1.1
> Host: 10.128.0.1:9000
> User-Agent: curl/7.69.1
> Accept: */*
>
* Mark bundle as not supporting multiuse
* HTTP 1.0, assume close after body
< HTTP/1.0 200 OK
< Server: SimpleHTTP/0.6 Python/3.11.3
< Date: Thu, 01 Jun 2023 16:05:09 GMT
< Content-type: text/html; charset=utf-8
< Content-Length: 2923
...
This example will feature 2 physical networks, each with a different VLAN, both pointing at the same OVS bridge.
Again, the first thing to do is create a dedicated OVS bridge, and attach the aforementioned virtualized network to it, while defining it as a trunk port for two broadcast domains, with tags 10 and 20.
for node in $(kubectl -n ovn-kubernetes get pods -l app=ovs-node -o jsonpath="{.items[*].metadata.name}")
do
kubectl -n ovn-kubernetes exec -ti $node -- ovs-vsctl --may-exist add-br ovsbr1
kubectl -n ovn-kubernetes exec -ti $node -- ovs-vsctl --may-exist add-port ovsbr1 eth1 trunks=10,20 vlan_mode=trunk
kubectl -n ovn-kubernetes exec -ti $node -- ovs-vsctl set open . external_ids:ovn-bridge-mappings=physnet:breth0,tenantblue:ovsbr1,tenantred:ovsbr1
done
We must now configure the physical network; since the packets are leaving the OVS bridge tagged with either the 10 or 20 VLAN, we must configure the physical network where the virtualized nodes run to handle the tagged traffic.
For that we will create two VLANed interfaces, each with a different subnet; we will need to know the name of the bridge the kind script created to implement the extra network it required. Those VLAN interfaces also need to be configured with an IP address: (for docker see previous example)
podman network inspect underlay --format '{{ .NetworkInterface }}'
podman3
# create the VLANs
ip link add link podman3 name podman3.10 type vlan id 10
ip addr add 192.168.123.1/24 dev podman3.10
ip link set dev podman3.10 up
ip link add link podman3 name podman3.20 type vlan id 20
ip addr add 192.168.124.1/24 dev podman3.20
ip link set dev podman3.20 up
NOTE: both the tenantblue
and tenantred
networks forward their traffic
to the ovsbr1
OVS bridge.
Let us now provision the attachment configuration for the two physical networks. Notice they do not have a subnet defined, which means our workloads must configure static IPs via cloud-init.
---
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
name: tenantred
spec:
config: |2
{
"cniVersion": "0.3.1",
"name": "tenantred",
"type": "ovn-k8s-cni-overlay",
"topology": "localnet",
"vlanID": 10,
"netAttachDefName": "default/tenantred"
}
---
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
name: tenantblue
spec:
config: |2
{
"cniVersion": "0.3.1",
"name": "tenantblue",
"type": "ovn-k8s-cni-overlay",
"topology": "localnet",
"vlanID": 20,
"netAttachDefName": "default/tenantblue"
}
NOTE: each of the tenantblue
and tenantred
networks tags their traffic
with a different VLAN, which must be listed on the port trunks
configuration.
These two VMs can be used for the OVS bridge sharing scenario (two physical networks share the same OVS bridge, each on a different VLAN).
---
apiVersion: kubevirt.io/v1alpha3
kind: VirtualMachine
metadata:
name: vm-red-1
spec:
running: true
template:
spec:
nodeSelector:
kubernetes.io/hostname: ovn-worker
domain:
devices:
disks:
- name: containerdisk
disk:
bus: virtio
- name: cloudinitdisk
disk:
bus: virtio
interfaces:
- name: physnet-red
bridge: {}
machine:
type: ""
resources:
requests:
memory: 1024M
networks:
- name: physnet-red
multus:
networkName: tenantred
terminationGracePeriodSeconds: 0
volumes:
- name: containerdisk
containerDisk:
image: quay.io/kubevirt/fedora-with-test-tooling-container-disk:devel
- name: cloudinitdisk
cloudInitNoCloud:
networkData: |
version: 2
ethernets:
eth0:
addresses: [ 192.168.123.10/24 ]
userData: |-
#cloud-config
password: fedora
chpasswd: { expire: False }
---
apiVersion: kubevirt.io/v1alpha3
kind: VirtualMachine
metadata:
name: vm-red-2
spec:
running: true
template:
spec:
nodeSelector:
kubernetes.io/hostname: ovn-worker
domain:
devices:
disks:
- name: containerdisk
disk:
bus: virtio
- name: cloudinitdisk
disk:
bus: virtio
interfaces:
- name: flatl2-overlay
bridge: {}
machine:
type: ""
resources:
requests:
memory: 1024M
networks:
- name: flatl2-overlay
multus:
networkName: tenantred
terminationGracePeriodSeconds: 0
volumes:
- name: containerdisk
containerDisk:
image: quay.io/kubevirt/fedora-with-test-tooling-container-disk:devel
- name: cloudinitdisk
cloudInitNoCloud:
networkData: |
version: 2
ethernets:
eth0:
addresses: [ 192.168.123.20/24 ]
userData: |-
#cloud-config
password: fedora
chpasswd: { expire: False }
---
apiVersion: kubevirt.io/v1alpha3
kind: VirtualMachine
metadata:
name: vm-blue-1
spec:
running: true
template:
spec:
nodeSelector:
kubernetes.io/hostname: ovn-worker
domain:
devices:
disks:
- name: containerdisk
disk:
bus: virtio
- name: cloudinitdisk
disk:
bus: virtio
interfaces:
- name: physnet-blue
bridge: {}
machine:
type: ""
resources:
requests:
memory: 1024M
networks:
- name: physnet-blue
multus:
networkName: tenantblue
terminationGracePeriodSeconds: 0
volumes:
- name: containerdisk
containerDisk:
image: quay.io/kubevirt/fedora-with-test-tooling-container-disk:devel
- name: cloudinitdisk
cloudInitNoCloud:
networkData: |
version: 2
ethernets:
eth0:
addresses: [ 192.168.124.10/24 ]
userData: |-
#cloud-config
password: fedora
chpasswd: { expire: False }
---
apiVersion: kubevirt.io/v1alpha3
kind: VirtualMachine
metadata:
name: vm-blue-2
spec:
running: true
template:
spec:
nodeSelector:
kubernetes.io/hostname: ovn-worker
domain:
devices:
disks:
- name: containerdisk
disk:
bus: virtio
- name: cloudinitdisk
disk:
bus: virtio
interfaces:
- name: physnet-blue
bridge: {}
machine:
type: ""
resources:
requests:
memory: 1024M
networks:
- name: physnet-blue
multus:
networkName: tenantblue
terminationGracePeriodSeconds: 0
volumes:
- name: containerdisk
containerDisk:
image: quay.io/kubevirt/fedora-with-test-tooling-container-disk:devel
- name: cloudinitdisk
cloudInitNoCloud:
networkData: |
version: 2
ethernets:
eth0:
addresses: [ 192.168.124.20/24 ]
userData: |-
#cloud-config
password: fedora
chpasswd: { expire: False }
You can check east/west connectivity between both red VMs via ICMP:
$ kubectl get vmi vm-red-2 -ojsonpath="{ @.status.interfaces }" | jq
[
{
"infoSource": "domain, guest-agent",
"interfaceName": "eth0",
"ipAddress": "192.168.123.20",
"ipAddresses": [
"192.168.123.20",
"fe80::e83d:16ff:fe76:c1bd"
],
"mac": "ea:3d:16:76:c1:bd",
"name": "flatl2-overlay",
"queueCount": 1
}
]
$ virtctl console vm-red-1
Successfully connected to vm-red-1 console. The escape sequence is ^]
[fedora@vm-red-1 ~]$ ping 192.168.123.20
PING 192.168.123.20 (192.168.123.20) 56(84) bytes of data.
64 bytes from 192.168.123.20: icmp_seq=1 ttl=64 time=0.534 ms
64 bytes from 192.168.123.20: icmp_seq=2 ttl=64 time=0.246 ms
64 bytes from 192.168.123.20: icmp_seq=3 ttl=64 time=0.178 ms
64 bytes from 192.168.123.20: icmp_seq=4 ttl=64 time=0.236 ms
--- 192.168.123.20 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3028ms
rtt min/avg/max/mdev = 0.178/0.298/0.534/0.138 ms
The same behavior can be seen on the VMs attached to the blue network:
$ kubectl get vmi vm-blue-2 -ojsonpath="{ @.status.interfaces }" | jq
[
{
"infoSource": "domain, guest-agent",
"interfaceName": "eth0",
"ipAddress": "192.168.124.20",
"ipAddresses": [
"192.168.124.20",
"fe80::6cae:e4ff:fefc:bd02"
],
"mac": "6e:ae:e4:fc:bd:02",
"name": "physnet-blue",
"queueCount": 1
}
]
$ virtctl console vm-blue-1
Successfully connected to vm-blue-1 console. The escape sequence is ^]
[fedora@vm-blue-1 ~]$ ping 192.168.124.20
PING 192.168.124.20 (192.168.124.20) 56(84) bytes of data.
64 bytes from 192.168.124.20: icmp_seq=1 ttl=64 time=0.531 ms
64 bytes from 192.168.124.20: icmp_seq=2 ttl=64 time=0.255 ms
64 bytes from 192.168.124.20: icmp_seq=3 ttl=64 time=0.688 ms
64 bytes from 192.168.124.20: icmp_seq=4 ttl=64 time=0.648 ms
--- 192.168.124.20 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3047ms
rtt min/avg/max/mdev = 0.255/0.530/0.688/0.169 ms
We can now start HTTP servers listening to the IPs attached on the VLAN interfaces:
python3 -m http.server --bind 192.168.123.1 9000 &
python3 -m http.server --bind 192.168.124.1 9000 &
And finally curl this from your client (blue network):
[fedora@vm-blue-1 ~]$ curl -v 192.168.124.1:9000
* Trying 192.168.124.1:9000...
* Connected to 192.168.124.1 (192.168.124.1) port 9000 (#0)
> GET / HTTP/1.1
> Host: 192.168.124.1:9000
> User-Agent: curl/7.69.1
> Accept: */*
>
* Mark bundle as not supporting multiuse
* HTTP 1.0, assume close after body
< HTTP/1.0 200 OK
< Server: SimpleHTTP/0.6 Python/3.11.3
< Date: Thu, 01 Jun 2023 16:05:09 GMT
< Content-type: text/html; charset=utf-8
< Content-Length: 2923
...
And from the client connected to the red network:
[fedora@vm-red-1 ~]$ curl -v 192.168.123.1:9000
* Trying 192.168.123.1:9000...
* Connected to 192.168.123.1 (192.168.123.1) port 9000 (#0)
> GET / HTTP/1.1
> Host: 192.168.123.1:9000
> User-Agent: curl/7.69.1
> Accept: */*
>
* Mark bundle as not supporting multiuse
* HTTP 1.0, assume close after body
< HTTP/1.0 200 OK
< Server: SimpleHTTP/0.6 Python/3.11.3
< Date: Thu, 01 Jun 2023 16:06:02 GMT
< Content-type: text/html; charset=utf-8
< Content-Length: 2923
<
...
In this post we have seen how to use OVN-Kubernetes to create secondary networks connected to the physical underlay, allowing both east/west communication between VMs, and access to services running outside the Kubernetes cluster.
]]>