document how to deal with "gitlab server full" issues authored by anarcat's avatar anarcat
......@@ -146,6 +146,42 @@ The `tpa-du-gl-volumes` script can also be used to analyse which
project is using the most disk space. Then those pipelines can be
adjusted to cache less.
### Disk full on GitLab server
If the main GitLab server is running out of space, then it's projects
that are taking up space. We've typically had trouble with artifacts
taking up space (tpo/tpa/team#40615, tpo/tpa/team#40517).
You can see the largest disk users in the GitLab admin area in
[Overview -> Projects -> Sort by: Largest repository](https://gitlab.torproject.org/admin/projects?sort=storage_size_desc).
Note that, although it's unlikely, it's technically possible that an
archived project takes up space, so make sure you check the "Show
archived projects" option in the "Sort by" drop down.
In the past, we have worked around that problem by reducing the
default artifact retention period from 4 to 2 weeks
(tpo/tpa/team#40516) but obviously does not take effect
immediately.
More recently, we have tried to tweak individual project's retention
policies and scheduling strategies (details in tpo/tpa/team#40615).
Please be aware of the [known upstream issues](#known-upstream-issues) that affect those
diagnostics as well.
To see if expiration policies work (or if "kept" artifacts or
old `job.log` are a problem), use this command (which takes a while to
run):
find -mtime +14 -print0 | du --files0-from=- -c -h | tee find-mtime+14-du.log
To limit this to `job.log`, of course, you can do:
find -name "job.log" -mtime +14 -print0 | du --files0-from=- -c -h | tee find-mtime+14-joblog-du.log
TODO: document how to safely remove old artifacts and `job.log` files.
### DNS resolution failures
Under certain circumstances (upgrades?) Docker loses DNS resolution
......@@ -485,6 +521,18 @@ runner](https://gitlab.com/gitlab-org/gitlab-runner/-/issues) and a [project pag
[File]: https://gitlab.torproject.org/tpo/tpa/gitlab/-/issues/new
[search]: https://gitlab.torproject.org/tpo/tpa/gitlab/-/issues
### Known upstream issues
* job log files (`job.log`) do *not* get automatically purged, even
if their related artifacts get purged (see [upstream feature
request 17245](https://gitlab.com/gitlab-org/gitlab/-/issues/17245)).
* the web interface might not correctly count disk usage of objects
related to a project ([upstream issue 228681](https://gitlab.com/gitlab-org/gitlab/-/issues/228681)) and certainly
doesn't count container images or volumes in disk usage
* [kept artifacts cannot be unkept](https://gitlab.com/gitlab-org/gitlab/-/issues/289954)
## Monitoring and testing
To test a runner, it can be registered only with a project, to run
......
......