network port misconfiguration on dal-sw-01
I've been trying to use a third port on our server named dal-node-03 and it seems there's no carrier on the port:
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master br0 state UP mode DEFAULT group default qlen 1000
link/ether 3c:ec:ef:d5:6a:e8 brd ff:ff:ff:ff:ff:ff
altname eno1np0
altname enp200s0f0np0
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether 3c:ec:ef:d5:6a:e9 brd ff:ff:ff:ff:ff:ff
altname eno2np1
altname enp200s0f1np1
4: eth2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
link/ether 3c:ec:ef:c0:1b:6a brd ff:ff:ff:ff:ff:ff
altname enp129s0f0
5: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether 3c:ec:ef:c0:1b:6b brd ff:ff:ff:ff:ff:ff
altname enp129s0f1
I have logged into the Switch's web interface and confirm the "operational port status" is "Link Down".
I've also noticed that port #3 on dal-node-03 is connected to port 11 on the switch instead of the expected 12, at least as far as I can tell from here. dal-node-02 has a similar misconfiguration.
Here's the status of LLDP discovery on the two affected servers:
root@dal-node-02:~# lldpcli show neighbors
-------------------------------------------------------------------------------
LLDP neighbors:
-------------------------------------------------------------------------------
Interface: eth0, via: LLDP, RID: 1, Time: 0 day, 00:00:50
Chassis:
ChassisID: mac 28:f1:0e:f2:b3:28
SysName: dal-sw-01
Port:
PortID: ifname Gi1/0/5
PortDescr: dal-node-02, port 1
TTL: 120
-------------------------------------------------------------------------------
Interface: eth1, via: LLDP, RID: 1, Time: 0 day, 00:00:46
Chassis:
ChassisID: mac 28:f1:0e:f2:b3:28
SysName: dal-sw-01
Port:
PortID: ifname Gi1/0/6
PortDescr: dal-node-02, port 2
TTL: 120
-------------------------------------------------------------------------------
Interface: eth2, via: LLDP, RID: 1, Time: 0 day, 00:00:50
Chassis:
ChassisID: mac 28:f1:0e:f2:b3:28
SysName: dal-sw-01
Port:
PortID: ifname Gi1/0/8
PortDescr: dal-node-02, port 4
TTL: 120
-------------------------------------------------------------------------------
Interface: eth3, via: LLDP, RID: 1, Time: 0 day, 00:00:49
Chassis:
ChassisID: mac 28:f1:0e:f2:b3:28
SysName: dal-sw-01
Port:
PortID: ifname Gi1/0/7
PortDescr: dal-node-02, port 3
TTL: 120
-------------------------------------------------------------------------------
root@dal-node-02:~#
root@dal-node-03:~# lldpcli show neighbors
-------------------------------------------------------------------------------
LLDP neighbors:
-------------------------------------------------------------------------------
Interface: eth0, via: LLDP, RID: 3, Time: 0 day, 00:47:26
Chassis:
ChassisID: mac 28:f1:0e:f2:b3:28
SysName: dal-sw-01
Port:
PortID: ifname Gi1/0/9
PortDescr: dal-node-03, port 1
TTL: 120
-------------------------------------------------------------------------------
Interface: eth1, via: LLDP, RID: 3, Time: 0 day, 00:47:26
Chassis:
ChassisID: mac 28:f1:0e:f2:b3:28
SysName: dal-sw-01
Port:
PortID: ifname Gi1/0/10
PortDescr: dal-node-03, port 2
TTL: 120
-------------------------------------------------------------------------------
Interface: eth3, via: LLDP, RID: 3, Time: 0 day, 00:45:54
Chassis:
ChassisID: mac 28:f1:0e:f2:b3:28
SysName: dal-sw-01
Port:
PortID: ifname Gi1/0/11
PortDescr: dal-node-03, port 3
TTL: 120
-------------------------------------------------------------------------------
root@dal-node-03:~#
Here's the correct configuration, on the first server:
root@dal-node-01:~# lldpcli show neighbors
-------------------------------------------------------------------------------
LLDP neighbors:
-------------------------------------------------------------------------------
Interface: eth0, via: LLDP, RID: 1, Time: 6 days, 03:22:41
Chassis:
ChassisID: mac 28:f1:0e:f2:b3:28
SysName: dal-sw-01
Port:
PortID: ifname Gi1/0/1
PortDescr: dal-node-01, port 1
TTL: 120
-------------------------------------------------------------------------------
Interface: eth1, via: LLDP, RID: 1, Time: 6 days, 03:22:42
Chassis:
ChassisID: mac 28:f1:0e:f2:b3:28
SysName: dal-sw-01
Port:
PortID: ifname Gi1/0/2
PortDescr: dal-node-01, port 2
TTL: 120
-------------------------------------------------------------------------------
Interface: eth2, via: LLDP, RID: 1, Time: 0 day, 00:05:27
Chassis:
ChassisID: mac 28:f1:0e:f2:b3:28
SysName: dal-sw-01
Port:
PortID: ifname Gi1/0/3
PortDescr: dal-node-01, port 3
TTL: 120
-------------------------------------------------------------------------------
Interface: eth3, via: LLDP, RID: 1, Time: 0 day, 00:05:48
Chassis:
ChassisID: mac 28:f1:0e:f2:b3:28
SysName: dal-sw-01
Port:
PortID: ifname Gi1/0/4
PortDescr: dal-node-01, port 4
TTL: 120
-------------------------------------------------------------------------------
root@dal-node-01:~#
I managed to recover port 12 by re-enabling auto-negociation. For some reason it was hardcoded to 1000mbps in the switch interface, switching that to auto fixed the link. But then it negociated at 100mbps. There's a couple more ports that sync at that speed as well.
So it looks like autonegociation is failing with the Intel gigabit cards.
We're looking at moving storage traffic to those new ports because that we're having extremely odd and worrisome corruption issue on the storage VLAN (#41176 (closed)).
So this needs to happen here:
-
swap port 3 and 4 on dal-node-02 (or confirm they are correctly wired), ports 8 and 7 on the switch -
swap port 3 and 4 on dal-node-03 (or confirm they are correctly wired), ports 12 and 11 on the switch -
replace cable on port 12 -
fix autonegociation on all ports to have 1000mbps (gigabit!)