monitoring

VirtAPIDown

Meaning

No running virt-api pod has been detected for 10 minutes.

The alert expression evaluates cluster:kubevirt_virt_api_pods_running:count == 0 with a for duration of 10 minutes. The recording rule counts pods in Running phase matching virt-api-.*.

In newer versions of KubeVirt, the alert expression is reworked to surface additional diagnostic labels (pod, reason) when a container waiting reason is available. If your alert includes these labels, see step 1 of the diagnosis below.

Impact

KubeVirt objects cannot send API calls.

Diagnosis

  1. Check the alert labels:

    If the alert includes a reason label (for example, CrashLoopBackOff, ErrImagePull, ImagePullBackOff), it directly identifies why virt-api is down. The pod label identifies the affected pod. Skip to Mitigation for the matching root cause. If these labels are not present, continue with the steps below.

  2. Set the NAMESPACE environment variable:

    $ export NAMESPACE="$(kubectl get kubevirt -A \
        -o custom-columns="":.metadata.namespace)"
    
  3. Check the status of the virt-api pods:

    $ kubectl -n $NAMESPACE get pods -l kubevirt.io=virt-api
    
  4. Check the status of the virt-api deployment:

    $ kubectl -n $NAMESPACE get deploy virt-api -o yaml
    
  5. Check the virt-api deployment details for issues such as crashing pods or image pull failures:

    $ kubectl -n $NAMESPACE describe deploy virt-api
    
  6. Review the logs of the virt-api pods:

    $ kubectl -n $NAMESPACE logs -l kubevirt.io=virt-api --previous
    $ kubectl -n $NAMESPACE logs -l kubevirt.io=virt-api
    
  7. Check for issues such as nodes in a NotReady state:

    $ kubectl get nodes
    

Mitigation

Try to identify the root cause and resolve the issue. Common causes include:

If you cannot resolve the issue, see the following resources: