Skip to content
Snippets Groups Projects
Closed load warnings on gnt-fsn: migrate some VMs to gnt-chi?
  • View options
  • load warnings on gnt-fsn: migrate some VMs to gnt-chi?

  • View options
  • Closed Issue created by anarcat

    in the last week, we've had a few warnings from nagios about load being two high in the gnt-fsn cluster, particularly on fsn-node-0[12]:

    2020-11-12 13:57:05 <nsa> tor-nagios: [fsn-node-01] load is WARNING: WARNING - load average: 27.93, 28.07, 22.95
    2020-11-12 14:57:03 <nsa> tor-nagios: [fsn-node-01] load is OK: OK - load average: 23.70, 24.29, 25.25
    2020-11-12 16:22:08 <nsa> tor-nagios: [fsn-node-01] load is WARNING: WARNING - load average: 42.81, 38.54, 35.18
    2020-11-17 12:31:05 <nsa> tor-nagios: [fsn-node-01] load is WARNING: WARNING - load average: 23.08, 38.64, 37.58
    2020-11-17 13:46:05 <nsa> tor-nagios: [fsn-node-01] load is OK: OK - load average: 26.70, 27.10, 25.82
    2020-11-17 14:11:05 <nsa> tor-nagios: [fsn-node-01] load is WARNING: WARNING - load average: 25.37, 25.99, 27.78
    2020-11-18 03:50:04 <nsa> tor-nagios: [fsn-node-01] load is WARNING: WARNING - load average: 30.22, 34.05, 30.19
    2020-11-18 04:49:59 <nsa> tor-nagios: [fsn-node-01] load is OK: OK - load average: 26.40, 22.63, 23.75
    2020-11-18 05:15:04 <nsa> tor-nagios: [fsn-node-01] load is WARNING: WARNING - load average: 23.39, 28.03, 28.51
    2020-11-18 08:00:09 <nsa> tor-nagios: [fsn-node-01] load is OK: OK - load average: 2.99, 9.16, 18.38
    2020-11-19 04:06:12 <nsa> tor-nagios: [fsn-node-02] load is WARNING: WARNING - load average: 38.44, 35.68, 30.18
    2020-11-19 04:21:12 <nsa> tor-nagios: [fsn-node-02] load is OK: OK - load average: 11.93, 15.21, 20.84

    It might be worth trying to figure out what, exactly, in there is causing those load spikes (see grafana or related nagios warnings) and move some of that stuff to the other ganeti cluster.

    machines to move:

    • onionoo-backend-02.torproject.org (maybe get the new metrics service admins to rebuild one of those from scratch?)
    • onionoo-frontend-02.torproject.org (rebuild from scratch?)
    • build-x86-12.torproject.org (we already have build-x86-11.torproject.org - maybe rebuild from scratch too?) - moved to #40135 (closed)

    those instances will require extra storage, so blocked on #40131 (closed) (update: iSCSI cluster working well enough for those to start):

    • tb-build-02 - redundant with tb-build-01 (rebuild from scratch?) #40198 (closed)
    • web-fsn-02 - same with web-fsn-01 (although maybe just retire and rebuild as web-chi-03?) moved to #40193 (closed)

    tb-build-02 would be particularly nice to migrate, as i suspect it's causing load warnings on fsn-node-03 right now.

    3 of 5 checklist items completed · Edited by anarcat

    Linked items ... 0

  • Activity

    • All activity
    • Comments only
    • History only
    • Newest first
    • Oldest first
    Loading Loading Loading Loading Loading Loading Loading Loading Loading Loading