fsn-node-02 unstability issues
fsn-node-02 seems to have problems staying up. it crashed once yesterday at ~13:00EDT and again today (twice) at 13:34 and 14:48.
I opened the following ticket with Hetzner:
> we have had problems with this host during the week. it's the second time now that we
> had to do a hard reset. network would first hang, then the controller would be reset
> by the kernel, with a pattern like this:
>
> Sep 17 06:26:18 fsn-node-02/fsn-node-02/::ffff:88.198.8.87 kernel: e1000e 0000:00:1f.6
> eth0: Detected Hardware Unit Hang:
> Sep 17 06:26:18 fsn-node-02/fsn-node-02/::ffff:88.198.8.87 kernel: e1000e 0000:00:1f.6
> eth0: Detected Hardware Unit Hang:
> Sep 17 06:26:18 fsn-node-02/fsn-node-02/::ffff:88.198.8.87 kernel: e1000e 0000:00:1f.6
> eth0: Detected Hardware Unit Hang:
> Sep 17 06:26:18 fsn-node-02/fsn-node-02/::ffff:88.198.8.87 kernel: e1000e 0000:00:1f.6
> eth0: Detected Hardware Unit Hang:
> Sep 17 06:26:18 fsn-node-02/fsn-node-02/::ffff:88.198.8.87 kernel: e1000e 0000:00:1f.6
> eth0: Detected Hardware Unit Hang:
> Sep 17 06:26:18 fsn-node-02/fsn-node-02/::ffff:88.198.8.87 kernel: e1000e 0000:00:1f.6
> eth0: Reset adapter unexpectedly
> Sep 17 06:26:18 fsn-node-02/fsn-node-02/::ffff:88.198.8.87 kernel: e1000e: eth0 NIC
> Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
> Sep 17 06:56:44 fsn-node-02/fsn-node-02/::ffff:88.198.8.87 kernel: e1000e 0000:00:1f.6
> eth0: Detected Hardware Unit Hang:
> Sep 17 06:56:44 fsn-node-02/fsn-node-02/::ffff:88.198.8.87 kernel: e1000e 0000:00:1f.6
> eth0: Detected Hardware Unit Hang:
> Sep 17 06:56:44 fsn-node-02/fsn-node-02/::ffff:88.198.8.87 kernel: e1000e 0000:00:1f.6
> eth0: Detected Hardware Unit Hang:
> Sep 17 06:56:44 fsn-node-02/fsn-node-02/::ffff:88.198.8.87 kernel: e1000e 0000:00:1f.6
> eth0: Detected Hardware Unit Hang:
> Sep 17 06:56:44 fsn-node-02/fsn-node-02/::ffff:88.198.8.87 kernel: e1000e 0000:00:1f.6
> eth0: Detected Hardware Unit Hang:
> Sep 17 06:56:44 fsn-node-02/fsn-node-02/::ffff:88.198.8.87 kernel: e1000e 0000:00:1f.6
> eth0: Reset adapter unexpectedly
> Sep 17 06:56:44 fsn-node-02/fsn-node-02/::ffff:88.198.8.87 kernel: e1000e: eth0 NIC
> Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
> Sep 17 06:57:18 fsn-node-02/fsn-node-02/::ffff:88.198.8.87 kernel: e1000e 0000:00:1f.6
> eth0: Detected Hardware Unit Hang:
> Sep 17 06:57:18 fsn-node-02/fsn-node-02/::ffff:88.198.8.87 kernel: e1000e 0000:00:1f.6
> eth0: Detected Hardware Unit Hang:
> Sep 17 06:57:18 fsn-node-02/fsn-node-02/::ffff:88.198.8.87 kernel: e1000e 0000:00:1f.6
> eth0: Detected Hardware Unit Hang:
> Sep 17 06:57:18 fsn-node-02/fsn-node-02/::ffff:88.198.8.87 kernel: e1000e 0000:00:1f.6
> eth0: Reset adapter unexpectedly
> Sep 17 06:57:18 fsn-node-02/fsn-node-02/::ffff:88.198.8.87 kernel: e1000e: eth0 NIC
> Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
>
> This seems to happen more or less randomly. Eventually, the entire server becomes
> unreachable and only a hard reset would restore it to a proper state. We only have
> those logs because they are sent to an external server.
They annoyingly stripped out part of that request so I lost part of it. But basically I asked them to investigate this as a hard problem.
issue