... | ... | @@ -146,6 +146,42 @@ The `tpa-du-gl-volumes` script can also be used to analyse which |
|
|
project is using the most disk space. Then those pipelines can be
|
|
|
adjusted to cache less.
|
|
|
|
|
|
### Disk full on GitLab server
|
|
|
|
|
|
If the main GitLab server is running out of space, then it's projects
|
|
|
that are taking up space. We've typically had trouble with artifacts
|
|
|
taking up space (tpo/tpa/team#40615, tpo/tpa/team#40517).
|
|
|
|
|
|
You can see the largest disk users in the GitLab admin area in
|
|
|
[Overview -> Projects -> Sort by: Largest repository](https://gitlab.torproject.org/admin/projects?sort=storage_size_desc).
|
|
|
|
|
|
Note that, although it's unlikely, it's technically possible that an
|
|
|
archived project takes up space, so make sure you check the "Show
|
|
|
archived projects" option in the "Sort by" drop down.
|
|
|
|
|
|
In the past, we have worked around that problem by reducing the
|
|
|
default artifact retention period from 4 to 2 weeks
|
|
|
(tpo/tpa/team#40516) but obviously does not take effect
|
|
|
immediately.
|
|
|
|
|
|
More recently, we have tried to tweak individual project's retention
|
|
|
policies and scheduling strategies (details in tpo/tpa/team#40615).
|
|
|
|
|
|
Please be aware of the [known upstream issues](#known-upstream-issues) that affect those
|
|
|
diagnostics as well.
|
|
|
|
|
|
To see if expiration policies work (or if "kept" artifacts or
|
|
|
old `job.log` are a problem), use this command (which takes a while to
|
|
|
run):
|
|
|
|
|
|
find -mtime +14 -print0 | du --files0-from=- -c -h | tee find-mtime+14-du.log
|
|
|
|
|
|
To limit this to `job.log`, of course, you can do:
|
|
|
|
|
|
find -name "job.log" -mtime +14 -print0 | du --files0-from=- -c -h | tee find-mtime+14-joblog-du.log
|
|
|
|
|
|
TODO: document how to safely remove old artifacts and `job.log` files.
|
|
|
|
|
|
### DNS resolution failures
|
|
|
|
|
|
Under certain circumstances (upgrades?) Docker loses DNS resolution
|
... | ... | @@ -485,6 +521,18 @@ runner](https://gitlab.com/gitlab-org/gitlab-runner/-/issues) and a [project pag |
|
|
[File]: https://gitlab.torproject.org/tpo/tpa/gitlab/-/issues/new
|
|
|
[search]: https://gitlab.torproject.org/tpo/tpa/gitlab/-/issues
|
|
|
|
|
|
### Known upstream issues
|
|
|
|
|
|
* job log files (`job.log`) do *not* get automatically purged, even
|
|
|
if their related artifacts get purged (see [upstream feature
|
|
|
request 17245](https://gitlab.com/gitlab-org/gitlab/-/issues/17245)).
|
|
|
|
|
|
* the web interface might not correctly count disk usage of objects
|
|
|
related to a project ([upstream issue 228681](https://gitlab.com/gitlab-org/gitlab/-/issues/228681)) and certainly
|
|
|
doesn't count container images or volumes in disk usage
|
|
|
|
|
|
* [kept artifacts cannot be unkept](https://gitlab.com/gitlab-org/gitlab/-/issues/289954)
|
|
|
|
|
|
## Monitoring and testing
|
|
|
|
|
|
To test a runner, it can be registered only with a project, to run
|
... | ... | |