This alert fires when
cluster:kubevirt_virt_operator_pods_running:count == 0 for 10
minutes, meaning no virt-operator pod in Running phase has been
detected.
The virt-operator is the first Operator to start in a cluster. Its
primary responsibilities include the following:
virt-controller, virt-handler, virt-launcher, and managing
their reconciliationIn newer versions of KubeVirt, the alert expression is reworked to
surface additional diagnostic labels (pod, reason) when a
container waiting reason is available. If your alert includes these
labels, see step 1 of the diagnosis below.
This alert indicates a failure at the level of the cluster. Critical cluster-wide management functionalities, such as certification rotation, upgrade, and reconciliation of controllers, might not be available.
The virt-operator is not directly responsible for virtual machines
(VMs) in the cluster. Therefore, its temporary unavailability does
not significantly affect VM workloads.
Check the alert labels:
If the alert includes a reason label (for example,
CrashLoopBackOff, ErrImagePull, ImagePullBackOff), it
directly identifies why virt-operator is down. The pod label
identifies the affected pod. Skip to
Mitigation for the matching root cause. If these
labels are not present, continue with the steps below.
Set the NAMESPACE environment variable:
$ export NAMESPACE="$(kubectl get kubevirt -A \
-o custom-columns="":.metadata.namespace)"
Check the status of the virt-operator deployment:
$ kubectl -n $NAMESPACE get deploy virt-operator -o yaml
Obtain the details of the virt-operator deployment:
$ kubectl -n $NAMESPACE describe deploy virt-operator
Check the status of the virt-operator pods:
$ kubectl -n $NAMESPACE get pods \
-l kubevirt.io=virt-operator
Review the logs of the virt-operator pods:
$ kubectl -n $NAMESPACE logs -l kubevirt.io=virt-operator \
--previous
$ kubectl -n $NAMESPACE logs -l kubevirt.io=virt-operator
Check for node issues, such as a NotReady state:
$ kubectl get nodes
Try to identify the root cause and resolve the issue. Common causes include:
virt-operator container is crashing
repeatedly. Check the pod logs for the root cause (panic, OOM,
misconfiguration).virt-operator pods exist. Check whether the
deployment has been scaled to zero, deleted, or blocked by
resource constraints.NotReady state, under resource
pressure, or have scheduling constraints that prevent the pods
from running.If you cannot resolve the issue, see the following resources: