Skip to content

GitLab

  • Menu
Projects Groups Snippets
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • Trac Trac
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Issues 246
    • Issues 246
    • List
    • Boards
    • Service Desk
    • Milestones
  • Monitor
    • Monitor
    • Metrics
    • Incidents
  • Analytics
    • Analytics
    • Value stream
  • Wiki
    • Wiki
  • Activity
  • Create a new issue
  • Issue Boards
Collapse sidebar
  • Legacy
  • TracTrac
  • Issues
  • #31805

Closed (moved)
(moved)
Open
Created Sep 19, 2019 by anarcat@anarcat

fsn-node-02 unstability issues

fsn-node-02 seems to have problems staying up. it crashed once yesterday at ~13:00EDT and again today (twice) at 13:34 and 14:48.

I opened the following ticket with Hetzner:

we have had problems with this host during the week. it's the second time now that we had to do a hard reset. network would first hang, then the controller would be reset by the kernel, with a pattern like this:

Sep 17 06:26:18 fsn-node-02/fsn-node-02/::ffff:88.198.8.87 kernel: e1000e 0000:00:1f.6 eth0: Detected Hardware Unit Hang: Sep 17 06:26:18 fsn-node-02/fsn-node-02/::ffff:88.198.8.87 kernel: e1000e 0000:00:1f.6 eth0: Detected Hardware Unit Hang: Sep 17 06:26:18 fsn-node-02/fsn-node-02/::ffff:88.198.8.87 kernel: e1000e 0000:00:1f.6 eth0: Detected Hardware Unit Hang: Sep 17 06:26:18 fsn-node-02/fsn-node-02/::ffff:88.198.8.87 kernel: e1000e 0000:00:1f.6 eth0: Detected Hardware Unit Hang: Sep 17 06:26:18 fsn-node-02/fsn-node-02/::ffff:88.198.8.87 kernel: e1000e 0000:00:1f.6 eth0: Detected Hardware Unit Hang: Sep 17 06:26:18 fsn-node-02/fsn-node-02/::ffff:88.198.8.87 kernel: e1000e 0000:00:1f.6 eth0: Reset adapter unexpectedly Sep 17 06:26:18 fsn-node-02/fsn-node-02/::ffff:88.198.8.87 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx Sep 17 06:56:44 fsn-node-02/fsn-node-02/::ffff:88.198.8.87 kernel: e1000e 0000:00:1f.6 eth0: Detected Hardware Unit Hang: Sep 17 06:56:44 fsn-node-02/fsn-node-02/::ffff:88.198.8.87 kernel: e1000e 0000:00:1f.6 eth0: Detected Hardware Unit Hang: Sep 17 06:56:44 fsn-node-02/fsn-node-02/::ffff:88.198.8.87 kernel: e1000e 0000:00:1f.6 eth0: Detected Hardware Unit Hang: Sep 17 06:56:44 fsn-node-02/fsn-node-02/::ffff:88.198.8.87 kernel: e1000e 0000:00:1f.6 eth0: Detected Hardware Unit Hang: Sep 17 06:56:44 fsn-node-02/fsn-node-02/::ffff:88.198.8.87 kernel: e1000e 0000:00:1f.6 eth0: Detected Hardware Unit Hang: Sep 17 06:56:44 fsn-node-02/fsn-node-02/::ffff:88.198.8.87 kernel: e1000e 0000:00:1f.6 eth0: Reset adapter unexpectedly Sep 17 06:56:44 fsn-node-02/fsn-node-02/::ffff:88.198.8.87 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx Sep 17 06:57:18 fsn-node-02/fsn-node-02/::ffff:88.198.8.87 kernel: e1000e 0000:00:1f.6 eth0: Detected Hardware Unit Hang: Sep 17 06:57:18 fsn-node-02/fsn-node-02/::ffff:88.198.8.87 kernel: e1000e 0000:00:1f.6 eth0: Detected Hardware Unit Hang: Sep 17 06:57:18 fsn-node-02/fsn-node-02/::ffff:88.198.8.87 kernel: e1000e 0000:00:1f.6 eth0: Detected Hardware Unit Hang: Sep 17 06:57:18 fsn-node-02/fsn-node-02/::ffff:88.198.8.87 kernel: e1000e 0000:00:1f.6 eth0: Reset adapter unexpectedly Sep 17 06:57:18 fsn-node-02/fsn-node-02/::ffff:88.198.8.87 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx

This seems to happen more or less randomly. Eventually, the entire server becomes unreachable and only a hard reset would restore it to a proper state. We only have those logs because they are sent to an external server.

They annoyingly stripped out part of that request so I lost part of it. But basically I asked them to investigate this as a hard problem.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
Time tracking