Secondary networks connected to the physical underlay for KubeVirt VMs using OVN-Kubernetes

Introduction

OVN (Open Virtual Network) is a series of daemons for the Open vSwitch that translate virtual network configurations into OpenFlow. It provides virtual networking capabilities for any type of workload on a virtualized platform (virtual machines and containers) using the same API.

OVN provides a higher-layer of abstraction than Open vSwitch, working with logical routers and logical switches, rather than flows. More details can be found in the OVN architecture man page.

In this post we will repeat the scenario of its bridge CNI equivalent, using this SDN approach. This secondary network topology is akin to the one described in the flatL2 topology, but allows connectivity to the physical underlay.

Demo

To run this demo, we will prepare a Kubernetes cluster with the following components installed:

The following section will show you how to create a KinD cluster, with upstream latest OVN-Kubernetes, and upstream latest multus-cni deployed.

Setup demo environment

Refer to the OVN-Kubernetes repo KIND documentation for more details; the gist of it is you should clone the OVN-Kubernetes repository, and run their kind helper script:

git clone git@github.com:ovn-org/ovn-kubernetes.git

cd ovn-kubernetes
pushd contrib ; ./kind.sh --multi-network-enable ; popd

This will get you a running kind cluster, configured to use OVN-Kubernetes as the default cluster network, configuring the multi-homing OVN-Kubernetes feature gate, and deploying multus-cni in the cluster.

Install KubeVirt in the cluster

Follow Kubevirt’s user guide to install the latest released version (currently, v0.59.0).

export RELEASE=$(curl https://storage.googleapis.com/kubevirt-prow/release/kubevirt/kubevirt/stable.txt)
kubectl apply -f "https://github.com/kubevirt/kubevirt/releases/download/${RELEASE}/kubevirt-operator.yaml"
kubectl apply -f "https://github.com/kubevirt/kubevirt/releases/download/${RELEASE}/kubevirt-cr.yaml"
kubectl -n kubevirt wait kv kubevirt --timeout=360s --for condition=Available

Now we have a Kubernetes cluster with all the pieces to start the Demo.

Single broadcast domain

In this scenario we will see how traffic from a single localnet network can be connected to a physical network in the host using a dedicated bridge.

This scenario does not use any VLAN encapsulation, thus is simpler, since the network admin does not need to provision any VLANs in advance.

Configuring the underlay

When you’ve started the KinD cluster with the --multi-network-enable flag an additional OCI network was created, and attached to each of the KinD nodes.

But still, further steps may be required, depending on the desired L2 configuration.

Let’s first create a dedicated OVS bridge, and attach the aforementioned virtualized network to it:

for node in $(kubectl -n ovn-kubernetes get pods -l app=ovs-node -o jsonpath="{.items[*].metadata.name}")
do
	kubectl -n ovn-kubernetes exec -ti $node -- ovs-vsctl --may-exist add-br ovsbr1
	kubectl -n ovn-kubernetes exec -ti $node -- ovs-vsctl --may-exist add-port ovsbr1 eth1
	kubectl -n ovn-kubernetes exec -ti $node -- ovs-vsctl set open . external_ids:ovn-bridge-mappings=physnet:breth0,localnet-network:ovsbr1
done

The first two commands are self-evident: you create an OVS bridge, and attach a port to it; the last one is not. In it, we’re using the OVN bridge mapping API to configure which OVS bridge must be used for each physical network. It creates a patch port between the OVN integration bridge - br-int - and the OVS bridge you tell it to, and traffic will be forwarded to/from it with the help of a localnet port.

NOTE: The provided mapping must match the name within the net-attach-def.Spec.Config JSON, otherwise, the patch ports will not be created.

You will also have to configure an IP address on the bridge for the extra-network the kind script created. For that, you first need to identify the bridge’s name. In the example below we’re providing a command for the podman runtime:

podman network inspect underlay --format '{{ .NetworkInterface }}'
podman3

ip addr add 10.128.0.1/24 dev podman3

NOTE: for docker, please use the following command:

ip a | grep `docker network inspect underlay --format '{{ index .IPAM.Config 0 "Gateway" }}'` | awk '{print $NF}'
br-0aeb0318f71f

ip addr add 10.128.0.1/24 dev br-0aeb0318f71f

Let’s also use an IP in the same subnet as the network subnet (defined in the NAD). This IP address must be excluded from the IPAM pool (also on the NAD), otherwise the OVN-Kubernetes IPAM may assign it to a workload.

Defining the OVN-Kubernetes networks

Once the underlay is configured, we can now provision the attachment configuration:

---
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
  name: localnet-network
spec:
  config: |2
    {
            "cniVersion": "0.3.1",
            "name": "localnet-network",
            "type": "ovn-k8s-cni-overlay",
            "topology": "localnet",
            "subnets": "10.128.0.0/24",
            "excludeSubnets": "10.128.0.1/32",
            "netAttachDefName": "default/localnet-network"
    }

It is required to list the gateway IP in the excludedSubnets attribute, thus preventing OVN-Kubernetes from assigning that IP address to the workloads.

Spin up the VMs

These two VMs can be used for the single broadcast domain scenario (no VLANs).

---
apiVersion: kubevirt.io/v1alpha3
kind: VirtualMachine
metadata:
  name: vm-server
spec:
  runStrategy: Always
  template:
    spec:
      nodeSelector:
        kubernetes.io/hostname: ovn-worker
      domain:
        devices:
          disks:
            - name: containerdisk
              disk:
                bus: virtio
            - name: cloudinitdisk
              disk:
                bus: virtio
          interfaces:
          - name: localnet
            bridge: {}
        machine:
          type: ""
        resources:
          requests:
            memory: 1024M
      networks:
      - name: localnet
        multus:
          networkName: localnet-network
      terminationGracePeriodSeconds: 0
      volumes:
        - name: containerdisk
          containerDisk:
            image: quay.io/kubevirt/fedora-with-test-tooling-container-disk:devel
        - name: cloudinitdisk
          cloudInitNoCloud:
            networkData: |
              version: 2
              ethernets:
                eth0:
                  dhcp4: true
            userData: |-
              #cloud-config
              password: fedora
              chpasswd: { expire: False }
---
apiVersion: kubevirt.io/v1alpha3
kind: VirtualMachine
metadata:
  name: vm-client
spec:
  runStrategy: Always
  template:
    spec:
      nodeSelector:
        kubernetes.io/hostname: ovn-worker2
      domain:
        devices:
          disks:
            - name: containerdisk
              disk:
                bus: virtio
            - name: cloudinitdisk
              disk:
                bus: virtio
          interfaces:
          - name: localnet
            bridge: {}
        machine:
          type: ""
        resources:
          requests:
            memory: 1024M
      networks:
      - name: localnet
        multus:
          networkName: localnet-network
      terminationGracePeriodSeconds: 0
      volumes:
        - name: containerdisk
          containerDisk:
            image: quay.io/kubevirt/fedora-with-test-tooling-container-disk:devel
        - name: cloudinitdisk
          cloudInitNoCloud:
            networkData: |
              version: 2
              ethernets:
                eth0:
                  dhcp4: true
            userData: |-
              #cloud-config
              password: fedora
              chpasswd: { expire: False }

Test East / West communication

You can check east/west connectivity between both VMs via ICMP:

$ kubectl get vmi vm-server -ojsonpath="{ @.status.interfaces }" | jq
[
  {
    "infoSource": "domain, guest-agent, multus-status",
    "interfaceName": "eth0",
    "ipAddress": "10.128.0.2",
    "ipAddresses": [
      "10.128.0.2",
      "fe80::e83d:16ff:fe76:c1bd"
    ],
    "mac": "ea:3d:16:76:c1:bd",
    "name": "localnet",
    "queueCount": 1
  }
]

$ virtctl console vm-client
Successfully connected to vm-client console. The escape sequence is ^]

[fedora@vm-client ~]$ ping 10.128.0.2
PING 10.128.0.2 (10.128.0.2) 56(84) bytes of data.
64 bytes from 10.128.0.2: icmp_seq=1 ttl=64 time=0.808 ms
64 bytes from 10.128.0.2: icmp_seq=2 ttl=64 time=0.478 ms
64 bytes from 10.128.0.2: icmp_seq=3 ttl=64 time=0.536 ms
64 bytes from 10.128.0.2: icmp_seq=4 ttl=64 time=0.507 ms

--- 10.128.0.2 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3005ms
rtt min/avg/max/mdev = 0.478/0.582/0.808/0.131 ms

Check underlay services

We can now start HTTP servers listening to the IPs attached on the gateway:

python3 -m http.server --bind 10.128.0.1 9000

And finally curl this from your client:

[fedora@vm-client ~]$ curl -v 10.128.0.1:9000
*   Trying 10.128.0.1:9000...
* Connected to 10.128.0.1 (10.128.0.1) port 9000 (#0)
> GET / HTTP/1.1
> Host: 10.128.0.1:9000
> User-Agent: curl/7.69.1
> Accept: */*
> 
* Mark bundle as not supporting multiuse
* HTTP 1.0, assume close after body
< HTTP/1.0 200 OK
< Server: SimpleHTTP/0.6 Python/3.11.3
< Date: Thu, 01 Jun 2023 16:05:09 GMT
< Content-type: text/html; charset=utf-8
< Content-Length: 2923
...

Multiple physical networks pointing to the same OVS bridge

This example will feature 2 physical networks, each with a different VLAN, both pointing at the same OVS bridge.

Configuring the underlay

Again, the first thing to do is create a dedicated OVS bridge, and attach the aforementioned virtualized network to it, while defining it as a trunk port for two broadcast domains, with tags 10 and 20.

for node in $(kubectl -n ovn-kubernetes get pods -l app=ovs-node -o jsonpath="{.items[*].metadata.name}")
do
	kubectl -n ovn-kubernetes exec -ti $node -- ovs-vsctl --may-exist add-br ovsbr1
	kubectl -n ovn-kubernetes exec -ti $node -- ovs-vsctl --may-exist add-port ovsbr1 eth1 trunks=10,20 vlan_mode=trunk
	kubectl -n ovn-kubernetes exec -ti $node -- ovs-vsctl set open . external_ids:ovn-bridge-mappings=physnet:breth0,tenantblue:ovsbr1,tenantred:ovsbr1
done

We must now configure the physical network; since the packets are leaving the OVS bridge tagged with either the 10 or 20 VLAN, we must configure the physical network where the virtualized nodes run to handle the tagged traffic.

For that we will create two VLANed interfaces, each with a different subnet; we will need to know the name of the bridge the kind script created to implement the extra network it required. Those VLAN interfaces also need to be configured with an IP address: (for docker see previous example)

podman network inspect underlay --format '{{ .NetworkInterface }}'
podman3

# create the VLANs
ip link add link podman3 name podman3.10 type vlan id 10
ip addr add 192.168.123.1/24 dev podman3.10
ip link set dev podman3.10 up

ip link add link podman3 name podman3.20 type vlan id 20
ip addr add 192.168.124.1/24 dev podman3.20
ip link set dev podman3.20 up

NOTE: both the tenantblue and tenantred networks forward their traffic to the ovsbr1 OVS bridge.

Defining the OVN-Kubernetes networks

Let us now provision the attachment configuration for the two physical networks. Notice they do not have a subnet defined, which means our workloads must configure static IPs via cloud-init.

---
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
  name: tenantred
spec:
  config: |2
    {
            "cniVersion": "0.3.1",
            "name": "tenantred",
            "type": "ovn-k8s-cni-overlay",
            "topology": "localnet",
            "vlanID": 10,
            "netAttachDefName": "default/tenantred"
    }
---
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
  name: tenantblue
spec:
  config: |2
    {
            "cniVersion": "0.3.1",
            "name": "tenantblue",
            "type": "ovn-k8s-cni-overlay",
            "topology": "localnet",
            "vlanID": 20,
            "netAttachDefName": "default/tenantblue"
    }

NOTE: each of the tenantblue and tenantred networks tags their traffic with a different VLAN, which must be listed on the port trunks configuration.

Spin up the VMs

These two VMs can be used for the OVS bridge sharing scenario (two physical networks share the same OVS bridge, each on a different VLAN).

---
apiVersion: kubevirt.io/v1alpha3
kind: VirtualMachine
metadata:
  name: vm-red-1
spec:
  runStrategy: Always
  template:
    spec:
      nodeSelector:
        kubernetes.io/hostname: ovn-worker
      domain:
        devices:
          disks:
            - name: containerdisk
              disk:
                bus: virtio
            - name: cloudinitdisk
              disk:
                bus: virtio
          interfaces:
          - name: physnet-red
            bridge: {}
        machine:
          type: ""
        resources:
          requests:
            memory: 1024M
      networks:
      - name: physnet-red
        multus:
          networkName: tenantred
      terminationGracePeriodSeconds: 0
      volumes:
        - name: containerdisk
          containerDisk:
            image: quay.io/kubevirt/fedora-with-test-tooling-container-disk:devel
        - name: cloudinitdisk
          cloudInitNoCloud:
            networkData: |
              version: 2
              ethernets:
                eth0:
                  addresses: [ 192.168.123.10/24 ]
            userData: |-
              #cloud-config
              password: fedora
              chpasswd: { expire: False }
---
apiVersion: kubevirt.io/v1alpha3
kind: VirtualMachine
metadata:
  name: vm-red-2
spec:
  runStrategy: Always
  template:
    spec:
      nodeSelector:
        kubernetes.io/hostname: ovn-worker
      domain:
        devices:
          disks:
            - name: containerdisk
              disk:
                bus: virtio
            - name: cloudinitdisk
              disk:
                bus: virtio
          interfaces:
          - name: flatl2-overlay
            bridge: {}
        machine:
          type: ""
        resources:
          requests:
            memory: 1024M
      networks:
      - name: flatl2-overlay
        multus:
          networkName: tenantred
      terminationGracePeriodSeconds: 0
      volumes:
        - name: containerdisk
          containerDisk:
            image: quay.io/kubevirt/fedora-with-test-tooling-container-disk:devel
        - name: cloudinitdisk
          cloudInitNoCloud:
            networkData: |
              version: 2
              ethernets:
                eth0:
                  addresses: [ 192.168.123.20/24 ]
            userData: |-
              #cloud-config
              password: fedora
              chpasswd: { expire: False }
---
apiVersion: kubevirt.io/v1alpha3
kind: VirtualMachine
metadata:
  name: vm-blue-1
spec:
  runStrategy: Always
  template:
    spec:
      nodeSelector:
        kubernetes.io/hostname: ovn-worker
      domain:
        devices:
          disks:
            - name: containerdisk
              disk:
                bus: virtio
            - name: cloudinitdisk
              disk:
                bus: virtio
          interfaces:
          - name: physnet-blue
            bridge: {}
        machine:
          type: ""
        resources:
          requests:
            memory: 1024M
      networks:
      - name: physnet-blue
        multus:
          networkName: tenantblue
      terminationGracePeriodSeconds: 0
      volumes:
        - name: containerdisk
          containerDisk:
            image: quay.io/kubevirt/fedora-with-test-tooling-container-disk:devel
        - name: cloudinitdisk
          cloudInitNoCloud:
            networkData: |
              version: 2
              ethernets:
                eth0:
                  addresses: [ 192.168.124.10/24 ]
            userData: |-
              #cloud-config
              password: fedora
              chpasswd: { expire: False }
---
apiVersion: kubevirt.io/v1alpha3
kind: VirtualMachine
metadata:
  name: vm-blue-2
spec:
  runStrategy: Always
  template:
    spec:
      nodeSelector:
        kubernetes.io/hostname: ovn-worker
      domain:
        devices:
          disks:
            - name: containerdisk
              disk:
                bus: virtio
            - name: cloudinitdisk
              disk:
                bus: virtio
          interfaces:
          - name: physnet-blue
            bridge: {}
        machine:
          type: ""
        resources:
          requests:
            memory: 1024M
      networks:
      - name: physnet-blue
        multus:
          networkName: tenantblue
      terminationGracePeriodSeconds: 0
      volumes:
        - name: containerdisk
          containerDisk:
            image: quay.io/kubevirt/fedora-with-test-tooling-container-disk:devel
        - name: cloudinitdisk
          cloudInitNoCloud:
            networkData: |
              version: 2
              ethernets:
                eth0:
                  addresses: [ 192.168.124.20/24 ]
            userData: |-
              #cloud-config
              password: fedora
              chpasswd: { expire: False }

Test East / West communication

You can check east/west connectivity between both red VMs via ICMP:

$ kubectl get vmi vm-red-2 -ojsonpath="{ @.status.interfaces }" | jq
[
  {
    "infoSource": "domain, guest-agent",
    "interfaceName": "eth0",
    "ipAddress": "192.168.123.20",
    "ipAddresses": [
      "192.168.123.20",
      "fe80::e83d:16ff:fe76:c1bd"
    ],
    "mac": "ea:3d:16:76:c1:bd",
    "name": "flatl2-overlay",
    "queueCount": 1
  }
]

$ virtctl console vm-red-1
Successfully connected to vm-red-1 console. The escape sequence is ^]

[fedora@vm-red-1 ~]$ ping 192.168.123.20
PING 192.168.123.20 (192.168.123.20) 56(84) bytes of data.
64 bytes from 192.168.123.20: icmp_seq=1 ttl=64 time=0.534 ms
64 bytes from 192.168.123.20: icmp_seq=2 ttl=64 time=0.246 ms
64 bytes from 192.168.123.20: icmp_seq=3 ttl=64 time=0.178 ms
64 bytes from 192.168.123.20: icmp_seq=4 ttl=64 time=0.236 ms

--- 192.168.123.20 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3028ms
rtt min/avg/max/mdev = 0.178/0.298/0.534/0.138 ms

The same behavior can be seen on the VMs attached to the blue network:

$ kubectl get vmi vm-blue-2 -ojsonpath="{ @.status.interfaces }" | jq
[
  {
    "infoSource": "domain, guest-agent",
    "interfaceName": "eth0",
    "ipAddress": "192.168.124.20",
    "ipAddresses": [
      "192.168.124.20",
      "fe80::6cae:e4ff:fefc:bd02"
    ],
    "mac": "6e:ae:e4:fc:bd:02",
    "name": "physnet-blue",
    "queueCount": 1
  }
]

$ virtctl console vm-blue-1
Successfully connected to vm-blue-1 console. The escape sequence is ^]

[fedora@vm-blue-1 ~]$ ping 192.168.124.20
PING 192.168.124.20 (192.168.124.20) 56(84) bytes of data.
64 bytes from 192.168.124.20: icmp_seq=1 ttl=64 time=0.531 ms
64 bytes from 192.168.124.20: icmp_seq=2 ttl=64 time=0.255 ms
64 bytes from 192.168.124.20: icmp_seq=3 ttl=64 time=0.688 ms
64 bytes from 192.168.124.20: icmp_seq=4 ttl=64 time=0.648 ms

--- 192.168.124.20 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3047ms
rtt min/avg/max/mdev = 0.255/0.530/0.688/0.169 ms

Accessing the underlay services

We can now start HTTP servers listening to the IPs attached on the VLAN interfaces:

python3 -m http.server --bind 192.168.123.1 9000 &
python3 -m http.server --bind 192.168.124.1 9000 &

And finally curl this from your client (blue network):

[fedora@vm-blue-1 ~]$ curl -v 192.168.124.1:9000
*   Trying 192.168.124.1:9000...
* Connected to 192.168.124.1 (192.168.124.1) port 9000 (#0)
> GET / HTTP/1.1
> Host: 192.168.124.1:9000
> User-Agent: curl/7.69.1
> Accept: */*
> 
* Mark bundle as not supporting multiuse
* HTTP 1.0, assume close after body
< HTTP/1.0 200 OK
< Server: SimpleHTTP/0.6 Python/3.11.3
< Date: Thu, 01 Jun 2023 16:05:09 GMT
< Content-type: text/html; charset=utf-8
< Content-Length: 2923
...

And from the client connected to the red network:

[fedora@vm-red-1 ~]$ curl -v 192.168.123.1:9000
*   Trying 192.168.123.1:9000...
* Connected to 192.168.123.1 (192.168.123.1) port 9000 (#0)
> GET / HTTP/1.1
> Host: 192.168.123.1:9000
> User-Agent: curl/7.69.1
> Accept: */*
> 
* Mark bundle as not supporting multiuse
* HTTP 1.0, assume close after body
< HTTP/1.0 200 OK
< Server: SimpleHTTP/0.6 Python/3.11.3
< Date: Thu, 01 Jun 2023 16:06:02 GMT
< Content-type: text/html; charset=utf-8
< Content-Length: 2923
< 
...

Conclusions

In this post we have seen how to use OVN-Kubernetes to create secondary networks connected to the physical underlay, allowing both east/west communication between VMs, and access to services running outside the Kubernetes cluster.