check-01 maxes a gigabit again
it looks like check-01 is misbhaving again.
i noticed this while debugging I/O problems with crm-int-01 (tpo/web/civicrm#97, private project). i have found that check-01 was doing about 819mbps on fsn-node-07, and that the node itself was maxed out, because it was also hosting relay-01 which is happily pushing 250mbps on the wire right now.
i'm in the process of migrating check-01 to its secondary (fsn-node-05).
here's a traffic graph of the node and its instances before the migration:
https://grafana.torproject.org/d/53QNFNtZz/traffic-per-class?orgId=1&var-class=All&var-node=check-01.torproject.org%3A9100&var-node=crm-int-01.torproject.org%3A9100&var-node=fsn-node-07.torproject.org%3A9100&var-node=henryi.torproject.org%3A9100&var-node=polyanthum.torproject.org%3A9100&var-node=relay-01.torproject.org%3A9100&var-node=fsn-node-05.torproject.org%3A9100&from=now-24h&to=now
you can clearly see how the traffic "flatlines" somewhere below 1gbps. i bet it could go much higher, possibly in the 10gig range, if we let it.
last time we dealt with this (in #40842 (closed)), we identified the following next steps:
- check if traffic comes from the same IP addresses, then;
- block or rate-limit the IPs
- consider automating this with fail2ban
- if this is legitimate traffic, consider creating a new VM with DNS RR distribution
- if that's not enough, consider adding an nginx cache in front (with per-IP caching of course)