HighCPUWorkload
Meaning
This alert fires when a node’s CPU utilization exceeds 90% for more than 5 minutes.
Impact
High CPU utilization can lead to:
- Degraded performance of applications running on the node
- Increased latency in request processing
- Potential service disruptions if CPU usage continues to climb
Diagnosis
- Identify the affected node:
- Check the node’s resource usage:
kubectl describe node <node-name>
- List pods that consume high amounts of CPU:
kubectl top pods --all-namespaces --sort-by=cpu
- Investigate specific pod details if needed:
kubectl describe pod <pod-name> -n <namespace>
Mitigation
- If the issue was caused by a malfunctioning pod:
- Consider restarting the pod
- Check pod logs for anomalies
- Review pod resource limits and requests
- If the issue is system-wide:
- Check for system processes that consume high amounts of CPU
- Consider cordoning the node and migrating workloads
- Evaluate if node scaling is needed
- Long-term solutions to avoid the issue:
- Implement or adjust pod resource limits
- Consider horizontal pod autoscaling
- Evaluate cluster capacity and scaling needs
Additional notes
- Monitor the node after mitigation to ensure CPU usage returns to normal
- Review application logs for potential root causes
- Consider updating resource requests/limits if this is a recurring issue
If you cannot resolve the issue, see the following resources: