Skip to content

gitlab is slow - high CPU and I/O wait

We've been having issues with gitlab since last week. The trouble started on Thursday 11th.

The symptoms that we can see are:

Current status:

  • load spikes are still an issue as of early September 2024
  • correlation between large CI runs (in tor-browser and friends, in particular) which do lots of concurrent fetches, tracked in tpo/applications/tor-browser#43121 (closed), possible workaround: object cache (#41705)
  • mitigations previously deployed by @brizental seem incomplete, possibly because artifacts storage is also slow, possible fix is to move to object storage ( #41403) but then we need to handle backups ( #41415).
  • multiple cause scenario more and more likely, could also be bots like last issues in May 2024 (#41597 (closed))
  • @brizental's experiments discarded the "noisy neighbor" theory for now, although we have a proposal to insert "idle canaries" to confirm such hypothesis (#41750 (closed))
  • TPA has been considering moving the GitLab VM to another, faster, cluster (#41431 (closed)) and scaling up the service by splitting GitLab components (#40479)
Edited by anarcat
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information