When running VMs using ODF storage with ‘rbd’ mounter or ‘rbd.csi.ceph.com’ provisioner, Windows VMs may cause reports of bad crc/signature errors due to certain I/O patterns. Cluster performance can be severely degraded if the number of re-transmissions due to crc errors causes network saturation. ‘krbd:rxbounce’ should be configured for the VM storage class to prevent these crc errors, the “ocs-storagecluster-ceph-rbd-virtualization” storage class uses this option by default, if available.
Cluster may report a huge number of CRC errors and the cluster might experience major service outages.
Obtain a list of VirtualMachines with an incorrectly configured storage class by running the following PromQL query:
$ kubevirt_ssp_vm_rbd_volume{rxbounce_enabled="false", volume_mode="Block"} == 1
kubevirt_ssp_vm_rbd_volume{name="testvmi-gwgdqp22k7", namespace="test_ns", pv_name="testvmi-gwgdqp22k7", rxbounce_enabled="false", volume_mode="Block"} 1
The output displays a list of VirtualMachines that use a storage class without
rxbounce_enabled
.
Obtain the storage class name by running the following command:
$ kubectl get pv ${PV_NAME} -o=jsonpath='{.spec.storageClassName}'
It is recommended to create a dedicated StorageClass with “krbd:rxbounce” map option for the disks of virtual machines, to use a bounce buffer when receiving data. The default behavior is to read directly into the destination buffer. A bounce buffer is required if the stability of the destination buffer cannot be guaranteed.
Note that changing the used storage class will not have any impact on existing PVCs/VMs, meaning that new VMs will be provisioned with the optimized settings but existing VMs need to be transitioned or the alert will continue to fire for those.
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: vm-sc
parameters:
# ...
mounter: rbd
mapOptions: "krbd:rxbounce"
provisioner: openshift-storage.rbd.csi.ceph.com
# ...
If you cannot resolve the issue, see the following resources: