scale out GitLab to 2k users
It seems like we have outgrown the initial reference architecture we were previously on, which was (up to 1,000 users) 8 vCPU, and 7.2 GB memory on a single server. We've already bumped memory to 16GB and are still swapping.
Let's look at how to scale out this thing. The next reference architecture is "up to 2k users", and it involves the following:
- load balancer: 2 vCPU, 1.8 GB memory (they suggest haproxy)
- postgresql: 2 vCPU, 7.5 GB memory
- redis: 1 vCPU, 3.75 GB memory
- gitaly: 4 vCPU, 15 GB memory
- gitlab rails: 2 x 8 vCPU, 7.2 GB memory
- object storage: unspecified
This therefore involves at least 5 machines, 25 cores, and 42GB of memory. We currently use 8 cores and 16GB of memory, so this would almost triple the hardware usage, and add significant complexity.
On the upside, however, we would have a saner GitLab deployment in the sense that we could reuse our existing components (e.g. the postgresql backups, #41426 (closed)). And from there on, it's much easier to reliably scale the box, restart components, and work on high availability. Each one of those components can be made redundant fairly trivially (apart from maybe postgresql, but their 3k user architecture solves this with pgbouncer although it also needs to introduce more complexity with things like consul (a service mesh) and sentinel (redis HA).
it should be noted, however, that the instructions still use the GitLab Omnibus package, so we do not get away from that level of complexity, unfortunately.