monitoring

VirtHandlerRESTErrorsHigh

Meaning

More than 5% of REST calls failed in virt-handler in the last 60 minutes. This alert usually indicates that the virt-handler pods have partially lost connection to the API server.

This error is frequently caused by one of the following problems:

Impact

Node-related actions, such as starting and migrating workloads, are delayed on the node that virt-handler is running on. Running workloads are not affected, but reporting their current status might be delayed.

Diagnosis

  1. Set the NAMESPACE environment variable:

    $ export NAMESPACE="$(kubectl get kubevirt -A -o custom-columns="":.metadata.namespace)"
    
  2. List the available virt-handler pods to identify the failing virt-handler pod:

    $ kubectl get pods -n $NAMESPACE -l=kubevirt.io=virt-handler
    
  3. Check the failing virt-handler pod log for error messages when connecting to the API server:

    $ kubectl logs -n $NAMESPACE <virt-handler>
    

    Example error message:

    {"component":"virt-handler","level":"error","msg":"Can't patch node my-node","pos":"heartbeat.go:96","reason":"the server has received too many API requests and has asked us to try again later","timestamp":"2023-11-06T11:11:41.099883Z","uid":"132c50c2-8d82-4e49-8857-dc737adcd6cc"}
    

Mitigation

If the virt-handler cannot connect to the API server, delete the pod to force a restart:

$ kubectl delete -n $NAMESPACE <virt-handler>

If you cannot resolve the issue, see the following resources: