Skip to content

internal network saturation in gnt-dal cluster

i stumbled upon this while browsing traffic graphs to debug gitlab performance issues (https://gitlab.torproject.org/tpo/tpa/team/-/issues/42152), but it seems like we're saturating the internal gigabit interfaces regularly inside gnt-dal. currently, it's between dal-node-01 and dal-node-02, and it's been going on for months at this stage:

image

https://grafana.torproject.org/d/53QNFNtZz/traffic-per-class?orgId=1&from=now-6M&to=now&timezone=browser&var-class=role%3A%3Aganeti%3A%3Adal&var-node=%24__all&var-device=%24__all&refresh=5s

it looks like this, in isolation:

image

it's from dal-node-02 to dal-node-01. over the last 30 days, our mean is 560mbps, max is 979mbps (basically saturation).

it looks like about 4 large transfers, about 6 times per day:

image

note that we also had a concern we were saturating the uplink in the past, which turned out to not really be a problem, see #42021 (closed).

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information