... | ... | @@ -735,6 +735,128 @@ for i in range(2000): |
|
|
print(i, h.hexdigest())
|
|
|
```
|
|
|
|
|
|
Given a list of `hashes`, you can try to guess the project number on
|
|
|
all of them with:
|
|
|
|
|
|
```python
|
|
|
import hashlib
|
|
|
|
|
|
for i in range(20000):
|
|
|
h = hashlib.sha256()
|
|
|
h.update(str(i).encode('ascii'))
|
|
|
if h.hexdigest() in hashes:
|
|
|
print(i, "is", h.hexdigest())
|
|
|
```
|
|
|
|
|
|
For example:
|
|
|
|
|
|
```
|
|
|
>>> hashes = [
|
|
|
... "085b2a38876eeddc33e3fbf612912d3d52a45c37cee95cf42cd3099d0a3fd8cb",
|
|
|
... "1483c82372b98e6864d52a9e4a66c92ac7b568d7f2ffca7f405ea0853af10e89",
|
|
|
... "23b0cc711cca646227414df7e7acb15e878b93723280f388f33f24b5dab92b0b",
|
|
|
... "327e892542e0f4097f90d914962a75ddbe9cb0577007d7b7d45dea310086bb97",
|
|
|
... "54e87e2783378cd883fb63bea84e2ecdd554b0646ec35a12d6df365ccad3c68b",
|
|
|
... "8952115444bab6de66aab97501f75fee64be3448203a91b47818e5e8943e0dfb",
|
|
|
... "9dacbde326501c9f63debf4311ae5e2bc047636edc4ee9d9ce828bcdf4a7f25d",
|
|
|
... "9dacbde326501c9f63debf4311ae5e2bc047636edc4ee9d9ce828bcdf4a7f25d",
|
|
|
... "a9346b0068335c634304afa5de1d51232a80966775613d8c1c5a0f6d231c8b1a",
|
|
|
... ]
|
|
|
>>> import hashlib
|
|
|
...
|
|
|
... for i in range(20000):
|
|
|
... h = hashlib.sha256()
|
|
|
... h.update(str(i).encode('ascii'))
|
|
|
... if h.hexdigest() in hashes:
|
|
|
... print(i, "is", h.hexdigest())
|
|
|
518 is 8952115444bab6de66aab97501f75fee64be3448203a91b47818e5e8943e0dfb
|
|
|
522 is a9346b0068335c634304afa5de1d51232a80966775613d8c1c5a0f6d231c8b1a
|
|
|
570 is 085b2a38876eeddc33e3fbf612912d3d52a45c37cee95cf42cd3099d0a3fd8cb
|
|
|
1088 is 9dacbde326501c9f63debf4311ae5e2bc047636edc4ee9d9ce828bcdf4a7f25d
|
|
|
1265 is 23b0cc711cca646227414df7e7acb15e878b93723280f388f33f24b5dab92b0b
|
|
|
1918 is 54e87e2783378cd883fb63bea84e2ecdd554b0646ec35a12d6df365ccad3c68b
|
|
|
2619 is 327e892542e0f4097f90d914962a75ddbe9cb0577007d7b7d45dea310086bb97
|
|
|
2620 is 1483c82372b98e6864d52a9e4a66c92ac7b568d7f2ffca7f405ea0853af10e89
|
|
|
```
|
|
|
|
|
|
Then you can poke around the GitLab API to see if they exist with:
|
|
|
|
|
|
while read id is hash; do curl -s https://gitlab.torproject.org/api/v4/projects/$id | jq .; done
|
|
|
|
|
|
For example:
|
|
|
|
|
|
```
|
|
|
$ while read id is hash; do curl -s https://gitlab.torproject.org/api/v4/projects/$id | jq .; done <<EOF
|
|
|
518 is 8952115444bab6de66aab97501f75fee64be3448203a91b47818e5e8943e0dfb
|
|
|
522 is a9346b0068335c634304afa5de1d51232a80966775613d8c1c5a0f6d231c8b1a
|
|
|
570 is 085b2a38876eeddc33e3fbf612912d3d52a45c37cee95cf42cd3099d0a3fd8cb
|
|
|
1088 is 9dacbde326501c9f63debf4311ae5e2bc047636edc4ee9d9ce828bcdf4a7f25d
|
|
|
1265 is 23b0cc711cca646227414df7e7acb15e878b93723280f388f33f24b5dab92b0b
|
|
|
1918 is 54e87e2783378cd883fb63bea84e2ecdd554b0646ec35a12d6df365ccad3c68b
|
|
|
2619 is 327e892542e0f4097f90d914962a75ddbe9cb0577007d7b7d45dea310086bb97
|
|
|
2620 is 1483c82372b98e6864d52a9e4a66c92ac7b568d7f2ffca7f405ea0853af10e89
|
|
|
EOF
|
|
|
{
|
|
|
"message": "404 Project Not Found"
|
|
|
}
|
|
|
{
|
|
|
"message": "404 Project Not Found"
|
|
|
}
|
|
|
{
|
|
|
"message": "404 Project Not Found"
|
|
|
}
|
|
|
{
|
|
|
"message": "404 Project Not Found"
|
|
|
}
|
|
|
{
|
|
|
"message": "404 Project Not Found"
|
|
|
}
|
|
|
{
|
|
|
"message": "404 Project Not Found"
|
|
|
}
|
|
|
{
|
|
|
"message": "404 Project Not Found"
|
|
|
}
|
|
|
{
|
|
|
"message": "404 Project Not Found"
|
|
|
}
|
|
|
```
|
|
|
|
|
|
... those were all deleted repositories.
|
|
|
|
|
|
## Counting projects
|
|
|
|
|
|
While the GitLab API is "paged", which makes you think you need to
|
|
|
iterate over all pages to count entries, there are special headers in
|
|
|
some requests that show you the total count. This, for example, shows
|
|
|
you the total number of projects on a given Gitaly backend:
|
|
|
|
|
|
curl -v -s -H "PRIVATE-TOKEN: $PRIVATE_TOKEN" \
|
|
|
"https://gitlab.torproject.org/api/v4/projects?repository_storage=default&simple=true" \
|
|
|
2>&1 | grep x-total
|
|
|
|
|
|
This, for example, was the spread between the two Gitaly servers
|
|
|
during that [epic migration](https://gitlab.torproject.org/tpo/tpa/team/-/issues/42225):
|
|
|
|
|
|
```
|
|
|
anarcat@angela:fabric-tasks$ curl -v -s -X GET -H "PRIVATE-TOKEN: $PRIVATE_TOKEN" "https://gitlab.torproject.org/api/v4/projects?repository_storage=default&simple=true" 2>&1 | grep x-total
|
|
|
< x-total: 817
|
|
|
< x-total-pages: 41
|
|
|
anarcat@angela:fabric-tasks$ curl -v -s -X GET -H "PRIVATE-TOKEN: $PRIVATE_TOKEN" "https://gitlab.torproject.org/api/v4/projects?repository_storage=storage1&simple=true" 2>&1 | grep x-total
|
|
|
< x-total: 1805
|
|
|
< x-total-pages: 91
|
|
|
```
|
|
|
|
|
|
The `default` server had 817 projects and `storage1` had 1805.
|
|
|
|
|
|
## Connect to the PostgreSQL server
|
|
|
|
|
|
We previously had instructions on how to connect to the GitLab Omnibus
|
|
|
PostgreSQL server, with the [upstream instructions](https://docs.gitlab.com/omnibus/maintenance/#starting-a-postgresql-superuser-psql-session) but this is now
|
|
|
deprecated. Normal PostgreSQL procedures should just work, like:
|
|
|
|
|
|
sudo -u postgres psql
|
|
|
|
|
|
## Moving projects between Gitaly servers
|
|
|
|
|
|
If there are multiple Gitaly servers (and there currently aren't:
|
... | ... | @@ -791,6 +913,13 @@ Note that those are two different integers: the first one is the |
|
|
`move_id` returned by the move API call, and the second is the project
|
|
|
ID. Both are visible in the `move-repo` output.
|
|
|
|
|
|
Note that some repositories just can't be moved. We've found two (out
|
|
|
of thousands) repositories like this during the `gitaly-01` migration
|
|
|
that were giving the error `invalid source repository`. It's unclear
|
|
|
why this happened: in this case the simplest solution was to destroy
|
|
|
the project and recreate it, because the project was small and didn't
|
|
|
have anything but the Git repository.
|
|
|
|
|
|
See also the [underlying design of repository moves](https://docs.gitlab.com/development/repository_storage_moves/).
|
|
|
|
|
|
### Moving groups of repositories
|
... | ... | @@ -1009,6 +1138,10 @@ Let's say you're migrating from the gitaly storage `default` to |
|
|
|
|
|
fab gitlab.list-projects --storage=default
|
|
|
|
|
|
In the `gitaly-01` migration, even after the above returned empty,
|
|
|
a bunch of projects were left on disk. It was found they were
|
|
|
actually deleted projects, so they were destroyed.
|
|
|
|
|
|
While migration happened, the Grafana panels [repository count per
|
|
|
server](https://grafana.torproject.org/d/QrDJktiMz/gitlab-omnibus?orgId=1&refresh=1m&from=now-24h&to=now&timezone=utc&var-node=gitlab-02.torproject.org&viewPanel=panel-47), [disk usage](https://grafana.torproject.org/d/zbCoGRjnz/disk-usage), [CPU usage](https://grafana.torproject.org/d/gex9eLcWz/cpu-usage) and [sidekiq](https://grafana.torproject.org/d/c3201e86-7dde-4897-9d67-a161d0b8d2bf/gitlab-sidekiq?folderUid=faa9db2b-c105-4c67-8f83-a918aaeac5e5&orgId=1&from=now-24h&to=now&timezone=utc&var-query0&var-node=gitlab-02.torproject.org&var-alias=gitlab-02.torproject.org) were used
|
|
|
to keep track of progress. We also keep an eye on [workhorse latency](https://grafana.torproject.org/d/QrDJktiMz/gitlab-omnibus?orgId=1&refresh=auto&from=now-2d&to=now&timezone=browser&var-node=gitlab-02.torproject.org&viewPanel=panel-21).
|
... | ... | @@ -1054,9 +1187,17 @@ gracefully: the job is marked as failed, and it moves on to the next |
|
|
one. Then housekeeping can be ran and the moves can be resumed.
|
|
|
|
|
|
[Heuristical housekeeping](https://docs.gitlab.com/administration/housekeeping/#heuristical-housekeeping) can be scheduled by tweaking
|
|
|
gitaly's `daily_maintenance.start_hour` setting.
|
|
|
gitaly's `daily_maintenance.start_hour` setting. Note that if you see
|
|
|
a message like:
|
|
|
|
|
|
```
|
|
|
msg="maintenance: repo optimization failure" error="could not repack: repack failed: signal: terminated: context deadline exceeded"
|
|
|
```
|
|
|
|
|
|
It is *possible* that scheduling a maintenance *while* doing the
|
|
|
... this means the job was terminated after running out of time. Raise
|
|
|
the `duration` of the job to fix this.
|
|
|
|
|
|
It might be *possible* that scheduling a maintenance *while* doing the
|
|
|
migration could resolve the disk space issue.
|
|
|
|
|
|
Note that maintenance logs can be tailed on gitaly-01 with:
|
... | ... | @@ -1067,13 +1208,32 @@ Or this will show maintenance tasks that take longer than one second: |
|
|
|
|
|
journalctl -o cat -u gitaly --since 2025-07-17T03:45 -f | jq -c '. | select (.source == "maintenance.daily") | select (.time_ms > 1000)'
|
|
|
|
|
|
## Connect to the PostgreSQL server
|
|
|
## Running Git on the Gitaly server
|
|
|
|
|
|
We previously had instructions on how to connect to the GitLab Omnibus
|
|
|
PostgreSQL server, with the [upstream instructions](https://docs.gitlab.com/omnibus/maintenance/#starting-a-postgresql-superuser-psql-session) but this is now
|
|
|
deprecated. Normal PostgreSQL procedures should just work, like:
|
|
|
While it's possible to run Git directly on the repositories in
|
|
|
`/home/git/repositories`, it's actually not recommended. First, `git`
|
|
|
is not actually shipped inside the Gitaly container (it's embedded in
|
|
|
the binary), so you need to call `git` *through* Gitaly to get through
|
|
|
to it. For example:
|
|
|
|
|
|
sudo -u postgres psql
|
|
|
podman run --rm -it --entrypoint /usr/local/bin/gitaly --user git:git \
|
|
|
-v /home/git/repositories:/home/git/repositories \
|
|
|
-v /etc/gitaly/config.toml:/etc/gitaly/config.toml \
|
|
|
registry.gitlab.com/gitlab-org/build/cng/gitaly:18-2-stable git
|
|
|
|
|
|
But even if you figure out that magic, the Gitlab folks advise you
|
|
|
against running Git commands directly on Gitaly-managed repositories,
|
|
|
because Gitaly holds its own internal view of the Git repo, and
|
|
|
changing the underlying repository might create inconsistencies.
|
|
|
|
|
|
See the [direct access to repositories](https://docs.gitlab.com/administration/gitaly/#directly-accessing-repositories) for more background. That
|
|
|
said, it seems like as long as you don't mess with the `refs`, you
|
|
|
should be fine. If you don't know what that means, don't actually mess
|
|
|
with the Git repos directly until you know what Git `refs` are. If you
|
|
|
do know, then you might be able to use `git` directly (as the `git`
|
|
|
user!) even without going through `gitaly git`.
|
|
|
|
|
|
The `gitaly git` command is [documented upstream here](https://docs.gitlab.com/administration/gitaly/troubleshooting/#use-gitaly-git-when-git-is-required-for-troubleshooting).
|
|
|
|
|
|
## Pager playbook
|
|
|
|
... | ... | @@ -1530,6 +1690,12 @@ service is running on the Gitaly side: |
|
|
|
|
|
Check the load on the server as well.
|
|
|
|
|
|
You can inspect the disk usage of the Gitaly server with:
|
|
|
|
|
|
```
|
|
|
Gitlab::GitalyClient::ServerService.new("default").storage_disk_statistics
|
|
|
```
|
|
|
|
|
|
Note that, as of this writing, the `gitlab:gitaly:check` job actually
|
|
|
raises an error:
|
|
|
|
... | ... | @@ -1549,6 +1715,119 @@ migration. The configuration was kept because [GitLab requires a |
|
|
default repository storage](https://docs.gitlab.com/administration/gitaly/configure_gitaly/#gitlab-requires-a-default-repository-storage), a [known (and 2019) issue](https://gitlab.com/gitlab-org/gitlab/-/issues/36175). See
|
|
|
[anarcat's latest comment on this](https://gitlab.com/gitlab-org/gitlab/-/issues/36175#note_2634803728).
|
|
|
|
|
|
Finally, you can run `gitaly check` to see what Gitaly itself thinks
|
|
|
of its status, with:
|
|
|
|
|
|
podman run -it --rm --entrypoint /usr/local/bin/gitaly \
|
|
|
--network host --user git:git \
|
|
|
-v /home/git/repositories:/home/git/repositories \
|
|
|
-v /etc/gitaly/config.toml:/etc/gitaly/config.toml \
|
|
|
-v /etc/ssl/private/gitaly-01.torproject.org.key:/etc/gitlab/ssl/key.pem \
|
|
|
-v /etc/ssl/torproject/certs/gitaly-01.torproject.org.crt-chained:/etc/gitlab/ssl/cert.pem \
|
|
|
registry.gitlab.com/gitlab-org/build/cng/gitaly:18-2-stable check /etc/gitaly/config.toml
|
|
|
|
|
|
Here's an example of a successful check:
|
|
|
|
|
|
```
|
|
|
root@gitaly-01:/# podman run --rm --entrypoint /usr/local/bin/gitaly --network host --user git:git -v /home/git/repositories:/home/git/repositories -v /etc/gitaly/config.toml:/etc/gitaly/config.toml -v /etc/ssl/private/gitaly-01.torproject.org.key:/etc/gitlab/ssl/key.pem -v /etc/ssl/torproject/certs/gitaly-01.torproject.org.crt-chained:/etc/gitlab/ssl/cert.pem registry.gitlab.com/gitlab-org/build/cng/gitaly:18-1-stable check /etc/gitaly/config.toml
|
|
|
Checking GitLab API access: OK
|
|
|
GitLab version: 18.1.2-ee
|
|
|
GitLab revision:
|
|
|
GitLab Api version: v4
|
|
|
Redis reachable for GitLab: true
|
|
|
OK
|
|
|
```
|
|
|
|
|
|
See also the [upstream Gitaly troubleshooting
|
|
|
guide](https://docs.gitlab.com/administration/gitaly/troubleshooting/
|
|
|
) and unit failures, below.
|
|
|
|
|
|
### Gitaly unit failure
|
|
|
|
|
|
If there's a unit failure on Gitaly, it's likely because of a health
|
|
|
check failure.
|
|
|
|
|
|
The Gitaly container has a health check which essentially checks that
|
|
|
a process named `gitaly` listens on the network inside the
|
|
|
container. This overrides the upstream checks which only checks on the
|
|
|
plain text port, which we have disabled, as we use our normal Let's
|
|
|
Encrypt certificates for TLS to communicate between Gitaly and its
|
|
|
clients. You can run the health check manually with:
|
|
|
|
|
|
podman healthcheck run systemd-gitaly; echo $?
|
|
|
|
|
|
If it prints nothing and returns zero, it's healthy, otherwise it will
|
|
|
print `unhealthy`.
|
|
|
|
|
|
You can do a manual check of the configuration with:
|
|
|
|
|
|
podman run --rm --entrypoint /usr/local/bin/gitaly --network host --user git:git -v /home/git/repositories:/home/git/repositories -v /etc/gitaly/config.toml:/etc/gitaly/config.toml -v /etc/ssl/private/gitaly-01.torproject.org.key:/etc/gitlab/ssl/key.pem -v /etc/ssl/torproject/certs/gitaly-01.torproject.org.crt-chained:/etc/gitlab/ssl/cert.pem registry.gitlab.com/gitlab-org/build/cng/gitaly:18-1-stable check /etc/gitaly/config.toml
|
|
|
|
|
|
The commandline is derived from the `ExecStart` you can find in:
|
|
|
|
|
|
systemctl cat gitaly | grep ExecStart
|
|
|
|
|
|
Unit failures are a little weird, because they're not obviously
|
|
|
associated with the `gitaly.service` unit. They're an opaque service
|
|
|
name. Here's an example failure:
|
|
|
|
|
|
```
|
|
|
root@gitaly-01:/# systemctl reset-failed
|
|
|
root@gitaly-01:/# systemctl --failed
|
|
|
UNIT LOAD ACTIVE SUB DESCRIPTION
|
|
|
|
|
|
0 loaded units listed.
|
|
|
root@gitaly-01:/# systemctl restart gitaly
|
|
|
root@gitaly-01:/# systemctl --failed
|
|
|
UNIT LOAD ACTIVE SUB DESCRIPTION >
|
|
|
● 03c9d594fe7f8d88b3a95e7c96bad3f6c77e7db2ea3ae094a5528eaa391ccbe5-5a87694937278ce9.service loaded failed failed [systemd-run] /usr/bin>
|
|
|
|
|
|
Legend: LOAD → Reflects whether the unit definition was properly loaded.
|
|
|
ACTIVE → The high-level unit activation state, i.e. generalization of SUB.
|
|
|
SUB → The low-level unit activation state, values depend on unit type.
|
|
|
|
|
|
1 loaded units listed.
|
|
|
root@gitaly-01:/# systemctl status 03c9d594fe7f8d88b3a95e7c96bad3f6c77e7db2ea3ae094a5528eaa391ccbe5-5a87694937278ce9.service | cat
|
|
|
× 03c9d594fe7f8d88b3a95e7c96bad3f6c77e7db2ea3ae094a5528eaa391ccbe5-5a87694937278ce9.service - [systemd-run] /usr/bin/podman healthcheck run 03c9d594fe7f8d88b3a95e7c96bad3f6c77e7db2ea3ae094a5528eaa391ccbe5
|
|
|
Loaded: loaded (/run/systemd/transient/03c9d594fe7f8d88b3a95e7c96bad3f6c77e7db2ea3ae094a5528eaa391ccbe5-5a87694937278ce9.service; transient)
|
|
|
Transient: yes
|
|
|
Active: failed (Result: exit-code) since Thu 2025-07-10 14:26:44 UTC; 639ms ago
|
|
|
Duration: 180ms
|
|
|
Invocation: ad6b3e2068cb42ac957fc43968a8a827
|
|
|
TriggeredBy: ● 03c9d594fe7f8d88b3a95e7c96bad3f6c77e7db2ea3ae094a5528eaa391ccbe5-5a87694937278ce9.timer
|
|
|
Process: 111184 ExecStart=/usr/bin/podman healthcheck run 03c9d594fe7f8d88b3a95e7c96bad3f6c77e7db2ea3ae094a5528eaa391ccbe5 (code=exited, status=1/FAILURE)
|
|
|
Main PID: 111184 (code=exited, status=1/FAILURE)
|
|
|
Mem peak: 13.4M
|
|
|
CPU: 98ms
|
|
|
|
|
|
Jul 10 14:26:44 gitaly-01 podman[111184]: 2025-07-10 14:26:44.42421901 +0000 UTC m=+0.121253308 container health_status 03c9d594fe7f8d88b3a95e7c96bad3f6c77e7db2ea3ae094a5528eaa391ccbe5 (image=registry.gitlab.com/gitlab-org/build/cng/gitaly:18-1-stable, name=systemd-gitaly, health_status=starting, health_failing_streak=2, health_log=, build-url=https://gitlab.com/gitlab-org/build/CNG/-/jobs/10619101696, io.openshift-min-memory=200Mi, io.openshift.non-scalable=false, io.openshift.tags=gitlab-gitaly, io.k8s.description=GitLab Gitaly service container., io.openshift.wants=gitlab-webservice, io.openshift.min-cpu=100m, PODMAN_SYSTEMD_UNIT=gitaly.service, build-job=gitaly, build-pipeline=https://gitlab.com/gitlab-org/build/CNG/-/pipelines/1915692529)
|
|
|
Jul 10 14:26:44 gitaly-01 systemd[1]: 03c9d594fe7f8d88b3a95e7c96bad3f6c77e7db2ea3ae094a5528eaa391ccbe5-5a87694937278ce9.service: Main process exited, code=exited, status=1/FAILURE
|
|
|
Jul 10 14:26:44 gitaly-01 systemd[1]: 03c9d594fe7f8d88b3a95e7c96bad3f6c77e7db2ea3ae094a5528eaa391ccbe5-5a87694937278ce9.service: Failed with result 'exit-code'.
|
|
|
root@gitaly-01:/# podman healthcheck run systemd-gitaly
|
|
|
unhealthy
|
|
|
```
|
|
|
|
|
|
In that case, the problem was that the [health check script](https://gitlab.com/gitlab-org/build/CNG/-/blob/master/gitaly/scripts/healthcheck?ref_type=heads) was
|
|
|
hardcoding the plain text port number. This was fixed in our container
|
|
|
configuration.
|
|
|
|
|
|
### Gitaly not enabled
|
|
|
|
|
|
If Gitaly is marked as "not enabled" in the [Gitaly servers admin
|
|
|
interface](https://gitlab.torproject.org/admin/gitaly_servers), it is generally because GitLab can't connect to
|
|
|
it.
|
|
|
|
|
|
### 500 error on Gitaly admin interface
|
|
|
|
|
|
It's also possible that entire page gives a 500 server error page. In
|
|
|
that case, look at `/var/log/gitlab/gitlab-rails/production.log`.
|
|
|
|
|
|
If you get a `permission denied: wrong hmac signature`, it's because
|
|
|
the `auth.token` Gitaly setting doesn't match the secret configured on
|
|
|
the GitLab server, see [this question](https://support.gitlab.com/hc/en-us/articles/20336430607260-500-error-on-GitLab-homepage-or-project-page-and-permission-denied-wrong-hmac-signature-error-in-Gitaly-Praefect-logs). Note that the secret needs
|
|
|
to be configured in the `repositories_storages` setting, *not* the
|
|
|
`gitaly['configuration'] = { auth: ... }` section.
|
|
|
|
|
|
## Disaster recovery
|
|
|
|
|
|
In case the entire GitLab machine is destroyed, a new server should be
|
... | ... | @@ -2188,30 +2467,21 @@ tag (see upstream issue [gitlab-org/build/CNG#2223](https://gitlab.com/gitlab-or |
|
|
the container image is in the [upstream CNG project](https://gitlab.com/gitlab-org/build/CNG/-/tree/578ae04e7515f5fde1a18dbe50b6bfddbd4a9719/gitaly).
|
|
|
|
|
|
Configuration on the host is inside `/etc/gitaly/config.toml`, which
|
|
|
includes secrets. This and `/home/git/repositories` are bind-mounted
|
|
|
includes secrets. Each Gitaly server has one or more `storage` entries
|
|
|
which MUST match the entries defined on the Gitaly clients (typically
|
|
|
GitLab Rails). For example, `gitaly-01` has a `storage1` configuration
|
|
|
in its `config.toml` file and is referred to as `storage1` on GitLab's
|
|
|
`gitlab.rb` file. Multiple storage backends could be used to have
|
|
|
different tiers of storage (e.g. NVMe, SSD, HDD) for different
|
|
|
repositories.
|
|
|
|
|
|
The configuration file and `/home/git/repositories` are bind-mounted
|
|
|
inside the container, which runs as the `git` user inside the
|
|
|
container and on the host (but not in rootless mode), in "host"
|
|
|
network mode (so ports are exposed directly inside the VM).
|
|
|
|
|
|
The Gitaly container has a health check which essentially checks that
|
|
|
a process named `gitaly` listens on the network inside the
|
|
|
container. This overrides the upstream checks which only checks on the
|
|
|
plain text port, which we have disabled, as we use our normal Let's
|
|
|
Encrypt certificates for TLS to communicate between Gitaly and its
|
|
|
clients. You can run the health check manually with:
|
|
|
|
|
|
podman healthcheck run systemd-gitaly; echo $?
|
|
|
|
|
|
If it prints nothing and returns zero, it's healthy, otherwise it will
|
|
|
print `unhealthy`.
|
|
|
|
|
|
You can do a manual check of the configuration with:
|
|
|
|
|
|
podman run --rm --entrypoint /usr/local/bin/gitaly --network host --user git:git -v /home/git/repositories:/home/git/repositories -v /etc/gitaly/config.toml:/etc/gitaly/config.toml -v /etc/ssl/private/gitaly-01.torproject.org.key:/etc/gitlab/ssl/key.pem -v /etc/ssl/torproject/certs/gitaly-01.torproject.org.crt-chained:/etc/gitlab/ssl/cert.pem registry.gitlab.com/gitlab-org/build/cng/gitaly:18-1-stable check /etc/gitaly/config.toml
|
|
|
|
|
|
The commandline is derived from the `ExecStart` you can find in:
|
|
|
|
|
|
systemctl cat gitaly | grep ExecStart
|
|
|
Once configured, make sure the health checks are okay, see [Gitaly
|
|
|
unit failure](#gitaly-unit-failure) for details.
|
|
|
|
|
|
Gitaly has multiple clients: the GitLab rails app, Sidekiq, and so
|
|
|
on. From our perspective, there's "the gitlab server" (`gitlab-02`)
|
... | ... | @@ -2233,14 +2503,8 @@ instance. |
|
|
|
|
|
Once a Gitaly server has been configured in GitLab, look in the
|
|
|
[gitaly section of the admin interface](https://gitlab.torproject.org/admin/gitaly_servers ) to see if it works
|
|
|
correctly. If it gives a 500 error, look in the GitLab rails
|
|
|
application logs (`/var/log/gitlab/gitlab-rails/production.log`) for
|
|
|
the error message. If you get a `permission denied: wrong hmac
|
|
|
signature`, it's because the `auth.token` Gitaly setting doesn't match
|
|
|
the secret configured on the GitLab server, see [this
|
|
|
question](https://support.gitlab.com/hc/en-us/articles/20336430607260-500-error-on-GitLab-homepage-or-project-page-and-permission-denied-wrong-hmac-signature-error-in-Gitaly-Praefect-logs). Note that the secret needs to be configured in the
|
|
|
`repositories_storages` setting, *not* the `gitaly['configuration'] =
|
|
|
{ auth: ... }` section.
|
|
|
correctly. If it fails, see [500 error on Gitaly admin interface](#500-error-on-gitaly-admin-interface
|
|
|
).
|
|
|
|
|
|
Use `gitlab-rake gitlab:gitaly:check` on the GitLab server to check
|
|
|
the Gitaly configuration, here's an example of a working configuration:
|
... | ... | @@ -2258,23 +2522,18 @@ Checking Gitaly ... Finished |
|
|
Repositories are *sharded* across servers, that is a repository is
|
|
|
stored only on *one* server and *not* replicated across the fleet. The
|
|
|
[repository weight](https://gitlab.torproject.org/help/administration/repository_storage_paths.md#configure-where-new-repositories-are-stored) determines the odds of a repository ending up
|
|
|
on a given Gitaly server. As of this writing, the gitaly server is
|
|
|
only in testing, so its weight is `0`, which means repositories are
|
|
|
not automatically assigned to it, but [repositories can be moved](https://docs.gitlab.com/administration/operations/moving_repositories/)
|
|
|
individually or in batch, through the GitLab API.
|
|
|
on a given Gitaly server. As of this writing, the `default` server is
|
|
|
now legacy, so its weight is `0`, which means repositories are not
|
|
|
automatically assigned to it, but [repositories can be moved](https://docs.gitlab.com/administration/operations/moving_repositories/)
|
|
|
individually or in batch, through the GitLab API. Note that the
|
|
|
`default` server has been turned off, so any move will result in a
|
|
|
failure.
|
|
|
|
|
|
Weights can be configured in the [repositories section of the GitLab
|
|
|
admin interface](https://gitlab.torproject.org/admin/application_settings/repository#js-repository-storage-settings).
|
|
|
|
|
|
#### Troubleshooting
|
|
|
|
|
|
https://docs.gitlab.com/administration/gitaly/troubleshooting/
|
|
|
|
|
|
disk use
|
|
|
|
|
|
```
|
|
|
Gitlab::GitalyClient::ServerService.new("default").storage_disk_statistics
|
|
|
```
|
|
|
The performance impact of moving to an external Gitaly server was
|
|
|
found to be either negligible or an improvement [during benchmarks](https://gitlab.torproject.org/tpo/tpa/team/-/issues/42225#note_3224581move-re).
|
|
|
|
|
|
## Upgrades
|
|
|
|
... | ... | @@ -2316,9 +2575,49 @@ Gitaly's container follows a minor release and needs to be updated |
|
|
when new minor releases come out. We've [asked upstream to improve on
|
|
|
this](https://gitlab.com/gitlab-org/build/CNG/-/issues/2223), but for now this requires some manual work.
|
|
|
|
|
|
We have a [tracking issue](https://docs.gitlab.com/policy/maintenance/#versioning) with periodically shifting reminders
|
|
|
that's manually tracking this work.
|
|
|
|
|
|
Podman should automatically upgrade containers on that minor release
|
|
|
branch, however.
|
|
|
|
|
|
To perform the upgrade, assuming we're upgrading from 18.1 to 18.2:
|
|
|
|
|
|
1. look for the current image in the `Image` field of the
|
|
|
`site/profile/files/gitaly/gitaly.container` unit, for example:
|
|
|
|
|
|
Image=registry.gitlab.com/gitlab-org/build/cng/gitaly:18-1-stable
|
|
|
|
|
|
2. check if the new image is available by pulling it from any
|
|
|
container runtime (this can be done on your laptop or `gitaly-01`,
|
|
|
does not matter):
|
|
|
|
|
|
podman pull registry.gitlab.com/gitlab-org/build/cng/gitaly:18-2-stable
|
|
|
|
|
|
3. check the [release notes](https://about.gitlab.com/releases/categories/releases/) for anything specific to Gitaly (for
|
|
|
example, the [18.2 release notes](https://about.gitlab.com/releases/2025/07/17/gitlab-18-2-released/) do not mention Gitaly at all,
|
|
|
so it's likely a noop upgrade)
|
|
|
|
|
|
4. change the container to chase the new stable release:
|
|
|
|
|
|
Image=registry.gitlab.com/gitlab-org/build/cng/gitaly:18-2-stable
|
|
|
|
|
|
5. commit and push to a feature branch
|
|
|
|
|
|
6. run Puppet on the Gitaly server(s):
|
|
|
|
|
|
cumin 'P:gitaly' 'patc --environment gitaly'
|
|
|
|
|
|
7. test Gitaly
|
|
|
|
|
|
8. merge the feature branch on success
|
|
|
|
|
|
9. update the due date to match the next expected release on the
|
|
|
[tracking issue](https://gitlab.torproject.org/tpo/tpa/team/-/issues/42239), currently the third Thursday of the month,
|
|
|
see the [versioning docs upstream](https://docs.gitlab.com/policy/maintenance/#versioning)
|
|
|
|
|
|
10. assign the tracking issue to whoever will be star that week
|
|
|
|
|
|
## SLA
|
|
|
|
|
|
<!-- this describes an acceptable level of service for this service -->
|
... | ... | |