NodeNetworkInterfaceDown
Meaning
This alert fires when one or more network interfaces on a node have been down
for more than 5 minutes. The alert excludes virtual ethernet (veth) devices and
bridge tunnels.
Impact
Network interface failures can lead to:
- Reduced network connectivity for pods on the affected node
- Potential service disruptions if critical network paths are affected
- Degraded cluster communication if management interfaces are impacted
Diagnosis
- Identify the affected node and interfaces:
ip link show | grep -i down
- Check network interface details:
- Review system logs for network-related issues:
journalctl -u NetworkManager
Mitigation
- For physical interface issues:
- Check physical cable connections
- Verify switch port configuration
- Test the interface with a different cable/port
- For software or configuration issues:
# Restart NetworkManager
systemctl restart NetworkManager
# Bring interface up manually
ip link set <interface-name> up
- If the issue persists:
- Check network interface configuration files
- Verify driver compatibility
- If the failure is on a physical interface, consider hardware replacement
Additional notes
- Monitor interface status after mitigation
- Document any hardware replacements or configuration changes
- Consider implementing network redundancy for critical interfaces
If you cannot resolve the issue, see the following resources: