setup new gitaly virtual server

Following detailed performance analysis in https://gitlab.torproject.org/tpo/tpa/team/-/issues/42152#note_3219821 and as part of the work of scaling gitlab (#40479), let's setup a new virtual machine for Gitaly, with the following specifications:

vCPU: 4
memory: 16
SSD disk: 250GiB (NVMe, just like git-data on gitlab-02 now)
HDD disk: no
responsible team: TPA
contact person: @anarcat
role user / group: none, managed by omnibus/puppet
HTTP proxy: TBD
domain name: gitaly-01.torproject.org

This is based on the 2k reference architecture.

current status

gitaly-01 is online and seems to be working correctly. we're going to run some benchmarks to see if it performs acceptable, and start migrating repositories progressively.

next steps

test container prototype and configuration
container go / no-go
- figure out what to do with hardcoded git paths (they are embedded! was just missing a setting in toml)
- figure out what to do with the container tag (https://gitlab.com/gitlab-org/build/CNG/-/issues/2223): we'll keep a rolling issue, like limesurvey, see #42239
- check that containers auto-upgrade
configure gitaly with the real gitlab secret (~~and configure gitlab from trocla? currently in /etc/gitlab/gitlab-secrets.json on gitlab-02~~)
open firewall

configure gitlab to talk with gitaly, with:

gitlab_rails['repositories_storages'] = {
  'default' => { 'gitaly_address' => 'unix:/var/opt/gitlab/gitaly/gitaly.socket' },
  'storage1' => { 'gitaly_address' => 'tls://gitaly1.internal:9999' },
}

then sudo gitlab-rake gitlab:gitaly:check

confirm that TLS is in use
migrate one repos to gitaly-01 (see https://docs.gitlab.com/administration/operations/moving_repositories/)
benchmark then move then benchmark a repo (maybe https://gitlab.torproject.org/anarcat/presentations? without submodules!)
propose full migration (tpa-rfc-89)
full migration:
- alpha phase, day one (2025-07-15), dogfooding and automation
  - anarcat (done)
  - tpo/tpa
  - tpo/web
- beta phase, day two (2025-07-16), less critical projects, external testers
  - tpo/community
  - tpo/onion-services
  - tpo/anti-censorship
  - tpo/network-health
- production phase, day two or three (2025-07-16+), remaining projects
  - tpo/core (includes c-tor and Arti!)
  - tpo/applications (includes Tor Browser and Mullvad Browser)
  - all remaining projects (1805 done, 817 to go), ran out of disk space
gitaly cleanup routines to free up space
full gitaly documentation (in particular how to move repos)
migrate remaining 817 projects (in progress)
migrate groups
migrate snippets
switch weights to gitaly-01
analyze failure history to see if we broke anything
deal with remaining repos (10)
- meskio/uget and shelikhoo/uget
- 8 unknowns
merge https://gitlab.torproject.org/tpo/tpa/puppet-control/-/merge_requests/89
set gitaly['enable'] = false on gitlab-02 once fully migrated?
mark rfc-89 obsolete

deployment notes

Here's analysis on the deployment options I originally made in #40479

i feel the next step here would be to create a gitaly VM to spin-off that workload entirely. i'm not super excited about running gitlab-omnibus on another server, but it looks like it's the standard way, as the project installation instructions link to https://about.gitlab.com/install/

the run gitaly on its own server and install gitaly instructions also point at the omnibus package,

there are also source only install instructions that seem relatively simple, but those require us to setup our own build pipeline, which i'm not super excited about.

docker doesn't seem to be an option, as the docker install instructions link to an "omnibus" image that has all of gitlab.

it's possible to install a gitaly-only helm chart in kubernetes, so that means it's also possible to use a docker image.. as it turns out, there is one buried deep inside the gitlab helm chart, so that is one option we could use to deploy gitaly standalone, with this image:
registry.gitlab.com/gitlab-org/build/cng/gitaly
once any of that is done, gitaly needs to be configured https://docs.gitlab.com/administration/gitaly/configure_gitaly/ - which is actually relatively simple: gitaly and the other apps share a secret and communicate over TLS, this could easily be done in puppet.

the architecture docs show that, by default, gitaly servers are sharded, in that they each host a subset of repositories, based on the "configured weights":

GitLab accesses repositories through the configured repository storages. Each new repository is stored on one of the repository storages based on their configured weights. (source)

gitaly cluster is more complicated: its architecture show that it has a multiple-node setup with Praefekt as a load balancer and a postgresql database to keep track of state. i don't think we want to go there, see also those design docs

so. next step, possibly trying to run gitaly in a container, in a VM, or setup a gitaly-only omnibus package.

then, the repositories can be moved to the new cluster, or some repositories will naturally be created there. in fact, it might be better to create new repositories (according to weight?) on the new VM and test it for a while...

Edited Jul 18, 2025 by anarcat