monitoring

NodeNetworkInterfaceDown

Meaning

This alert fires when one or more network interfaces on a node have been down for more than 5 minutes. The alert excludes virtual ethernet (veth) devices and bridge tunnels.

Impact

Network interface failures can lead to:

Diagnosis

  1. Identify the affected node and interfaces:
    kubectl get nodes
    
    ssh <node-address>
    
    ip link show | grep -i down
    
  2. Check network interface details:
    ip addr show
    
    ethtool <interface-name>
    
  3. Review system logs for network-related issues:
    journalctl -u NetworkManager
    
    dmesg | grep -i eth
    

Mitigation

  1. For physical interface issues:
    • Check physical cable connections
    • Verify switch port configuration
    • Test the interface with a different cable/port
  2. For software or configuration issues:
    # Restart NetworkManager
    systemctl restart NetworkManager
    
    # Bring interface up manually
    ip link set <interface-name> up
    
  3. If the issue persists:
    • Check network interface configuration files
    • Verify driver compatibility
    • If the failure is on a physical interface, consider hardware replacement

Additional notes

If you cannot resolve the issue, see the following resources: