NetworkPolicies for KubeVirt VMs secondary networks using OVN-Kubernetes
Introduction
Kubernetes NetworkPolicies are constructs to control traffic flow at the IP address or port level (OSI layers 3 or 4). They allow the user to specify how a pod (or group of pods) is allowed to communicate with other entities on the network. In simpler words: the user can specify ingress from or egress to other workloads, using L3 / L4 semantics.
Keeping in mind NetworkPolicy
is a Kubernetes construct - which only cares
about a single network interface - they are only usable for the cluster’s
default network interface. This leaves a considerable gap for Virtual Machine
users, since they are heavily invested in secondary networks.
The k8snetworkplumbingwg has addressed this limitation by providing a
MultiNetworkPolicy
CRD - it features the exact same API as NetworkPolicy
but can target network-attachment-definitions.
OVN-Kubernetes implements this API, and configures access control accordingly
for secondary networks in the cluster.
In this post we will see how we can govern access control for VMs using the multi-network policy API. On our simple example, we’ll only allow into our VMs for traffic ingressing from a particular CIDR range.
Current limitations of MultiNetworkPolicies
for VMs
Kubernetes NetworkPolicy
has three types of policy peers:
- namespace selectors: allows ingress-from, egress-to based on the peer’s namespace labels
- pod selectors: allows ingress-from, egress-to based on the peer’s labels
- ip block: allows ingress-from, egress-to based on the peer’s IP address
While MultiNetworkPolicy
allows these three types, when used with VMs we
recommend using only the IPBlock
policy peer - both namespace
and pod
selectors prevent the live-migration of Virtual Machines (these policy peers
require OVN-K managed IPAM, and currently the live-migration feature is only
available when IPAM is not enabled on the interfaces).
Demo
To run this demo, we will prepare a Kubernetes cluster with the following components installed:
The following section will show you how to create a KinD cluster, with upstream latest OVN-Kubernetes, upstream latest multus-cni, and the multi-network policy CRDs deployed.
Setup demo environment
Refer to the OVN-Kubernetes repo KIND documentation for more details; the gist of it is you should clone the OVN-Kubernetes repository, and run their kind helper script:
git clone git@github.com:ovn-org/ovn-kubernetes.git
cd ovn-kubernetes
pushd contrib ; ./kind.sh --multi-network-enable ; popd
This will get you a running kind cluster (one control plane, and two worker
nodes), configured to use OVN-Kubernetes as the default cluster network,
configuring the multi-homing OVN-Kubernetes feature gate, and deploying
multus-cni
in the cluster.
Install KubeVirt in the cluster
Follow Kubevirt’s user guide to install the latest released version (currently, v1.0.0).
export RELEASE=$(curl https://storage.googleapis.com/kubevirt-prow/release/kubevirt/kubevirt/stable.txt)
kubectl apply -f "https://github.com/kubevirt/kubevirt/releases/download/${RELEASE}/kubevirt-operator.yaml"
kubectl apply -f "https://github.com/kubevirt/kubevirt/releases/download/${RELEASE}/kubevirt-cr.yaml"
kubectl -n kubevirt wait kv kubevirt --timeout=360s --for condition=Available
Now we have a Kubernetes cluster with all the pieces to start the Demo.
Limiting ingress to a KubeVirt VM
In this example, we will configure a MultiNetworkPolicy
allowing ingress into
our VMs only from a particular CIDR range - let’s say 10.200.0.0/30
.
Provision the following NAD (to allow our VMs to live-migrate, we do not define
a subnet
):
---
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
name: flatl2net
spec:
config: |2
{
"cniVersion": "0.4.0",
"name": "flatl2net",
"type": "ovn-k8s-cni-overlay",
"topology":"layer2",
"netAttachDefName": "default/flatl2net"
}
Let’s now provision our six VMs, with the following name to IP address (statically configured via cloud-init) association:
- vm1: 10.200.0.1
- vm2: 10.200.0.2
- vm3: 10.200.0.3
- vm4: 10.200.0.4
- vm5: 10.200.0.5
- vm6: 10.200.0.6
---
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
labels:
kubevirt.io/vm: vm1
name: vm1
spec:
running: true
template:
metadata:
labels:
name: access-control
kubevirt.io/domain: vm1
kubevirt.io/vm: vm1
spec:
domain:
devices:
disks:
- disk:
bus: virtio
name: containerdisk
- disk:
bus: virtio
name: cloudinitdisk
interfaces:
- bridge: {}
name: flatl2-overlay
rng: {}
resources:
requests:
memory: 1024Mi
networks:
- multus:
networkName: flatl2net
name: flatl2-overlay
termination/GracePeriodSeconds: 30
volumes:
- containerDisk:
image: quay.io/kubevirt/fedora-with-test-tooling-container-disk:v1.0.0
name: containerdisk
- cloudInitNoCloud:
networkData: |
ethernets:
eth0:
addresses:
- 10.200.0.1/24
version: 2
userData: |-
#cloud-config
user: fedora
password: password
chpasswd: { expire: False }
name: cloudinitdisk
---
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
labels:
kubevirt.io/vm: vm2
name: vm2
spec:
running: true
template:
metadata:
labels:
name: access-control
kubevirt.io/domain: vm2
kubevirt.io/vm: vm2
spec:
domain:
devices:
disks:
- disk:
bus: virtio
name: containerdisk
- disk:
bus: virtio
name: cloudinitdisk
interfaces:
- bridge: {}
name: flatl2-overlay
rng: {}
resources:
requests:
memory: 1024Mi
networks:
- multus:
networkName: flatl2net
name: flatl2-overlay
termination/GracePeriodSeconds: 30
volumes:
- containerDisk:
image: quay.io/kubevirt/fedora-with-test-tooling-container-disk:v1.0.0
name: containerdisk
- cloudInitNoCloud:
networkData: |
ethernets:
eth0:
addresses:
- 10.200.0.2/24
version: 2
userData: |-
#cloud-config
user: fedora
password: password
chpasswd: { expire: False }
name: cloudinitdisk
---
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
labels:
kubevirt.io/vm: vm3
name: vm3
spec:
running: true
template:
metadata:
labels:
name: access-control
kubevirt.io/domain: vm3
kubevirt.io/vm: vm3
spec:
domain:
devices:
disks:
- disk:
bus: virtio
name: containerdisk
- disk:
bus: virtio
name: cloudinitdisk
interfaces:
- bridge: {}
name: flatl2-overlay
rng: {}
resources:
requests:
memory: 1024Mi
networks:
- multus:
networkName: flatl2net
name: flatl2-overlay
termination/GracePeriodSeconds: 30
volumes:
- containerDisk:
image: quay.io/kubevirt/fedora-with-test-tooling-container-disk:v1.0.0
name: containerdisk
- cloudInitNoCloud:
networkData: |
ethernets:
eth0:
addresses:
- 10.200.0.3/24
version: 2
userData: |-
#cloud-config
user: fedora
password: password
chpasswd: { expire: False }
name: cloudinitdisk
---
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
labels:
kubevirt.io/vm: vm4
name: vm4
spec:
running: true
template:
metadata:
labels:
name: access-control
kubevirt.io/domain: vm4
kubevirt.io/vm: vm4
spec:
domain:
devices:
disks:
- disk:
bus: virtio
name: containerdisk
- disk:
bus: virtio
name: cloudinitdisk
interfaces:
- bridge: {}
name: flatl2-overlay
rng: {}
resources:
requests:
memory: 1024Mi
networks:
- multus:
networkName: flatl2net
name: flatl2-overlay
termination/GracePeriodSeconds: 30
volumes:
- containerDisk:
image: quay.io/kubevirt/fedora-with-test-tooling-container-disk:v1.0.0
name: containerdisk
- cloudInitNoCloud:
networkData: |
ethernets:
eth0:
addresses:
- 10.200.0.4/24
version: 2
userData: |-
#cloud-config
user: fedora
password: password
chpasswd: { expire: False }
name: cloudinitdisk
---
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
labels:
kubevirt.io/vm: vm5
name: vm5
spec:
running: true
template:
metadata:
labels:
name: access-control
kubevirt.io/domain: vm5
kubevirt.io/vm: vm5
spec:
domain:
devices:
disks:
- disk:
bus: virtio
name: containerdisk
- disk:
bus: virtio
name: cloudinitdisk
interfaces:
- bridge: {}
name: flatl2-overlay
rng: {}
resources:
requests:
memory: 1024Mi
networks:
- multus:
networkName: flatl2net
name: flatl2-overlay
termination/GracePeriodSeconds: 30
volumes:
- containerDisk:
image: quay.io/kubevirt/fedora-with-test-tooling-container-disk:v1.0.0
name: containerdisk
- cloudInitNoCloud:
networkData: |
ethernets:
eth0:
addresses:
- 10.200.0.5/24
version: 2
userData: |-
#cloud-config
user: fedora
password: password
chpasswd: { expire: False }
name: cloudinitdisk
---
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
labels:
kubevirt.io/vm: vm6
name: vm6
spec:
running: true
template:
metadata:
labels:
name: access-control
kubevirt.io/domain: vm6
kubevirt.io/vm: vm6
spec:
domain:
devices:
disks:
- disk:
bus: virtio
name: containerdisk
- disk:
bus: virtio
name: cloudinitdisk
interfaces:
- bridge: {}
name: flatl2-overlay
rng: {}
resources:
requests:
memory: 1024Mi
networks:
- multus:
networkName: flatl2net
name: flatl2-overlay
termination/GracePeriodSeconds: 30
volumes:
- containerDisk:
image: quay.io/kubevirt/fedora-with-test-tooling-container-disk:v1.0.0
name: containerdisk
- cloudInitNoCloud:
networkData: |
ethernets:
eth0:
addresses:
- 10.200.0.6/24
version: 2
userData: |-
#cloud-config
user: fedora
password: password
chpasswd: { expire: False }
name: cloudinitdisk
NOTE: it is important to highlight all the Virtual Machines (and the
network-attachment-definition
) are defined in the default
namespace.
After this step, we should have the following deployment:
Let’s check the VMs vm1
and vm4
can ping their peers in the same subnet.
For that we will
connect to the VMs over their serial console:
First, let’s check vm1
:
âžś virtctl console vm1
Successfully connected to vm1 console. The escape sequence is ^]
[fedora@vm1 ~]$ ping 10.200.0.2 -c 4
PING 10.200.0.2 (10.200.0.2) 56(84) bytes of data.
64 bytes from 10.200.0.2: icmp_seq=1 ttl=64 time=5.16 ms
64 bytes from 10.200.0.2: icmp_seq=2 ttl=64 time=1.41 ms
64 bytes from 10.200.0.2: icmp_seq=3 ttl=64 time=34.2 ms
64 bytes from 10.200.0.2: icmp_seq=4 ttl=64 time=2.56 ms
--- 10.200.0.2 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3005ms
rtt min/avg/max/mdev = 1.406/10.841/34.239/13.577 ms
[fedora@vm1 ~]$ ping 10.200.0.6 -c 4
PING 10.200.0.6 (10.200.0.6) 56(84) bytes of data.
64 bytes from 10.200.0.6: icmp_seq=1 ttl=64 time=3.77 ms
64 bytes from 10.200.0.6: icmp_seq=2 ttl=64 time=1.46 ms
64 bytes from 10.200.0.6: icmp_seq=3 ttl=64 time=5.47 ms
64 bytes from 10.200.0.6: icmp_seq=4 ttl=64 time=1.74 ms
--- 10.200.0.6 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3007ms
rtt min/avg/max/mdev = 1.459/3.109/5.469/1.627 ms
[fedora@vm1 ~]$
And from vm4:
âžś ~ virtctl console vm4
Successfully connected to vm4 console. The escape sequence is ^]
[fedora@vm4 ~]$ ping 10.200.0.1 -c 4
PING 10.200.0.1 (10.200.0.1) 56(84) bytes of data.
64 bytes from 10.200.0.1: icmp_seq=1 ttl=64 time=3.20 ms
64 bytes from 10.200.0.1: icmp_seq=2 ttl=64 time=1.62 ms
64 bytes from 10.200.0.1: icmp_seq=3 ttl=64 time=1.44 ms
64 bytes from 10.200.0.1: icmp_seq=4 ttl=64 time=0.951 ms
--- 10.200.0.1 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3006ms
rtt min/avg/max/mdev = 0.951/1.803/3.201/0.843 ms
[fedora@vm4 ~]$ ping 10.200.0.6 -c 4
PING 10.200.0.6 (10.200.0.6) 56(84) bytes of data.
64 bytes from 10.200.0.6: icmp_seq=1 ttl=64 time=1.85 ms
64 bytes from 10.200.0.6: icmp_seq=2 ttl=64 time=1.02 ms
64 bytes from 10.200.0.6: icmp_seq=3 ttl=64 time=1.27 ms
64 bytes from 10.200.0.6: icmp_seq=4 ttl=64 time=0.970 ms
--- 10.200.0.6 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3005ms
rtt min/avg/max/mdev = 0.970/1.275/1.850/0.350 ms
We will now provision a MultiNetworkPolicy
applying to all the VMs defined
above. To do this mapping correcly, the policy has to:
- Be in the same namespace as the VM.
- Set
k8s.v1.cni.cncf.io/policy-for
annotation matching the secondary network used by the VM. - Set
matchLabels
selector matching the labels set on VM’sspec.template.metadata
.
This policy will allow ingress into these access-control
labeled pods
only if the traffic originates from within the 10.200.0.0/30
CIDR range
(IPs 10.200.0.1-3).
---
apiVersion: k8s.cni.cncf.io/v1beta1
kind: MultiNetworkPolicy
metadata:
name: ingress-ipblock
annotations:
k8s.v1.cni.cncf.io/policy-for: default/flatl2net
spec:
podSelector:
matchLabels:
name: access-control
policyTypes:
- Ingress
ingress:
- from:
- ipBlock:
cidr: 10.200.0.0/30
Taking into account our example, only
vm1
, vm2
, and vm3
will be able to contact any of its peers, as pictured
by the following diagram:
Let’s try again the ping after provisioning the MultiNetworkPolicy
object:
From vm1
(inside the allowed ip block range):
[fedora@vm1 ~]$ ping 10.200.0.2 -c 4
PING 10.200.0.2 (10.200.0.2) 56(84) bytes of data.
64 bytes from 10.200.0.2: icmp_seq=1 ttl=64 time=6.48 ms
64 bytes from 10.200.0.2: icmp_seq=2 ttl=64 time=4.40 ms
64 bytes from 10.200.0.2: icmp_seq=3 ttl=64 time=1.28 ms
64 bytes from 10.200.0.2: icmp_seq=4 ttl=64 time=1.51 ms
--- 10.200.0.2 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3006ms
rtt min/avg/max/mdev = 1.283/3.418/6.483/2.154 ms
[fedora@vm1 ~]$ ping 10.200.0.6 -c 4
PING 10.200.0.6 (10.200.0.6) 56(84) bytes of data.
64 bytes from 10.200.0.6: icmp_seq=1 ttl=64 time=3.81 ms
64 bytes from 10.200.0.6: icmp_seq=2 ttl=64 time=2.67 ms
64 bytes from 10.200.0.6: icmp_seq=3 ttl=64 time=1.68 ms
64 bytes from 10.200.0.6: icmp_seq=4 ttl=64 time=1.63 ms
--- 10.200.0.6 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3006ms
rtt min/avg/max/mdev = 1.630/2.446/3.808/0.888 ms
From vm4
(outside the allowed ip block range):
[fedora@vm4 ~]$ ping 10.200.0.1 -c 4
PING 10.200.0.1 (10.200.0.1) 56(84) bytes of data.
--- 10.200.0.1 ping statistics ---
4 packets transmitted, 0 received, 100% packet loss, time 3083ms
[fedora@vm4 ~]$ ping 10.200.0.6 -c 4
PING 10.200.0.6 (10.200.0.6) 56(84) bytes of data.
--- 10.200.0.6 ping statistics ---
4 packets transmitted, 0 received, 100% packet loss, time 3089ms
Conclusions
In this post we’ve shown how MultiNetworkPolicies
can be used to provide
access control to VMs with secondary network interfaces.
We have provided a comprehensive example on how a policy can be used to limit ingress to our VMs only from desired sources, based on the client’s IP address.