consider reducing artifacts disk space usage
Summary
We're having issues with disk space usage on the GitLab server, and artifacts are currently the main offender. This project, and its (mostly) forks take up a significant chunk of the space on the server.
For example, here's the top 5 few core/tor forks and their disk usage:
- dgoulet/tor: 9.3GB
- core/tor: 5.9GB
- mikeperry/tor: 4.6GB
- nickm/tor: 4GB
- ahf/tor: 3.3GB
This adds up to 24GB if I count this right, which is about 10% of the disk capacity before we raised it in an emergency last weekend.
(Note: it's actually fairly difficult to come up with good numbers for this.. The GitLab admin interface doesn't allow us to filter by fork, and the name-based search is limited. There's possibly a few more forks that take up space, sometimes a few gigabytes, so that total is a conservative estimate.
Relevant logs and/or screenshots
Possible fixes
I am not certain. Other projects we have talked with have lowered their retention policy to one week (jnewsome/sponsor-61-sims#13 (closed)) or one hour (tpo/tpa/team#40616 (closed)), the latter which helped tremendously. But you already have a lower retention period (one week), so I'm not sure how to fix this.
I open this issue merely so that the team is aware of the problem and that we're hoping to see if you have ideas on how to fix this.
I wonder, for example, if forks need those artifacts at all... I am not sure if we could disable artifact retentions on forks, but if that's an option, we could figure out a way to purge those. Otherwise, maybe we could consider a one-day retention period? Latest artifacts are kept for the most recent successful job for each ref, so that should already cover for quite a bit.
I also noticed that you have a lot of jobs running, possibly in parallel. Would there be a way to reuse artifacts across those jobs to reduce disk usage? For example, in this pipeline, all jobs but debian-distcheck, debian-docs and debian-tracing generate a 10-15MB binary for every push. Those seem small at first, but they add up quick... Those, granted, are different, so we could instead take the example of debian-distcheck and debian-tracing who both seem to generate a .tar.gz artifact that is identical. It may seem like I'm pulling at straws here, and it could very well be what I'm doing... but I guess I'm surprised that source code and simple binary artifacts would add up so quickly...
I guess the TL;DR: question here is: how long do you really need artifacts for? Could we reduce retention to a day, knowing that latest artifacts are kept regardless? And is there some duplication we could reduce?
Thanks!