More than 5% of REST calls failed in virt-handler
in the last 60 minutes. This
alert usually indicates that the virt-handler
pods have partially lost
connection to the API server.
This error is frequently caused by one of the following problems:
The API server is overloaded, which causes timeouts. To verify if this is the case, check the metrics of the API server, and view its response times and overall calls.
The virt-handler
pod cannot reach the API server. This is commonly caused by
DNS issues on the node and networking connectivity issues.
Node-related actions, such as starting and migrating workloads, are delayed on
the node that virt-handler
is running on. Running workloads are not affected,
but reporting their current status might be delayed.
Set the NAMESPACE
environment variable:
$ export NAMESPACE="$(kubectl get kubevirt -A -o custom-columns="":.metadata.namespace)"
List the available virt-handler
pods to identify the failing virt-handler
pod:
$ kubectl get pods -n $NAMESPACE -l=kubevirt.io=virt-handler
Check the failing virt-handler
pod log for error messages when connecting
to the API server:
$ kubectl logs -n $NAMESPACE <virt-handler>
Example error message:
{"component":"virt-handler","level":"error","msg":"Can't patch node my-node","pos":"heartbeat.go:96","reason":"the server has received too many API requests and has asked us to try again later","timestamp":"2023-11-06T11:11:41.099883Z","uid":"132c50c2-8d82-4e49-8857-dc737adcd6cc"}
If the virt-handler
cannot connect to the API server, delete the pod to force
a restart:
$ kubectl delete -n $NAMESPACE <virt-handler>
If you cannot resolve the issue, see the following resources: