$ apt-get install -y ca-certificates git...After this operation, 101 MB of additional disk space will be used.E: You don't have enough free space in /builds/jnewsome/sponsor-61-sims/job-cache/apt/.Cleaning up project directory and file based variables 00:01ERROR: Job failed: exit code 1
thanks for the ticket, @jnewsome ... just to be sure, this is not directly related with #40476 (closed) because this is not the cache volume you intend to use for sims?
i am starting to think we just need to grow the disk on that runner...
... and that's just the top 10. I haven't yet made a thing that digests all of those caches and spits out the per project disk usage (but it's on my roadmap :p).
The TL;DR: is a problem that @lavamind correctly identified before: the @tpo/core people are taking a lot of disk space on runners. :) we need to figure out ways to better deal with the caching here.
could one way of solving this be to enable the gitlab registry (gitlab#89 (closed)), so that instead of relying on the cache, the core tor people could rely on docker images for their base stuff?
it might seem like just shifting the problem elsewhere, but docker images are easier to reuse and cleanup than caches...
otherwise we could just throw hardware at the problem, but we actually don't have that much free disk space on that cluster, unless we start sucking things out of the SAN, but that is kind of painful to configure, so i'm procrastinating a bit on that.
could one way of solving this be to enable the gitlab registry (gitlab#89 (closed)), so that instead of relying on the cache, the core tor people could rely on docker images for their base stuff?
I'm not familiar with how exactly these projects are using cache, but my 2c from the sponsor 67 (shadow sims) project: I have a lot of custom dependencies (shadow, tgen, a patched tor, oniontrace, ...) that need to be built before running the simulation. Initially I was baking those into a custom Docker image, pushing those to dockerhub, and then having the CI pull that image. That turned out to be a big headache though - having to locally build and push a big image any time one of the deps is changed/tweaked is annoying, and easy to get wrong or forget to do. Putting each of those as a job in the CI with a cached result is much easier to manage.
The idea behind this is not only to enable the image registry, but also
allow users to build and push their own images from within CI, something
which is currently not possible because it requires a privileged Docker
instance.
in other words, if your deps are being built automatically as part of a
scheduled CI run, would that help?
...
On 2021-11-01 16:56:24, Jim Newsome (@jnewsome) wrote:
Jim Newsome commented:
could one way of solving this be to enable the gitlab registry (gitlab#89 (closed)), so that instead of relying on the cache, the core tor people could rely on docker images for their base stuff?
I'm not familiar with how exactly these projects are using cache, but my 2c from the sponsor 67 (shadow sims) project: I have a lot of custom dependencies (shadow, tgen, a patched tor, oniontrace, ...) that need to be built before running the simulation. Initially I was baking those into the Docker image, pushing those to dockerhub, and then having the CI pull that image. That turned out to be a big headache though - having to locally build and push a big image any time one of the deps is changed/tweaked is annoying, and easy to get wrong or forget to do. Putting each of those as a job in the CI with a cached result is much easier to manage.
--
Antoine Beaupré
torproject.org system administration
it could still be the same CI workflow, just split in multiple jobs... no?
Ah, yeah if I guess if you support building Docker images from within the CI that's true.
Wouldn't we need to support Docker-in-Docker to be able to build Docker images from inside gitlab jobs, though? I briefly looked at that and got the impression that required the "outer" docker to run in privileged mode, but maybe there's a way to do it without it.
It's also a bit less granular than caches. e.g. right now if I change just the Shadow version or just the Tor version, I'll only have a cache miss for that build, but still have a hit for the other. Not necessarily a deal-breaker, but a downside.
I'm also unclear why it's easier to manage Docker images than caches, but you would know better than me :)
Wouldn't we need to support Docker-in-Docker to be able to build Docker images from inside gitlab jobs, though? I briefly looked at that and got the impression that required the "outer" docker to run in privileged mode, but maybe there's a way to do it without it.
we do need some sort of DIND to build images, yes. i managed to build an image from scratch inside our CI with plain docker import (which doesn't require anything magic like namespaces), but that's not the typical way you build images. details in gitlab#90 (closed).
It's also a bit less granular than caches. e.g. right now if I change just the Shadow version or just the Tor version, I'll only have a cache miss for that build, but still have a hit for the other. Not necessarily a deal-breaker, but a downside.
not sure what you mean there, but surely we could replicate that system with a multi-layered Docker image, e.g. the Shadow image would be build FROM tor?
I'm also unclear why it's easier to manage Docker images than caches, but you would know better than me :)
well at least one way images are easier is that they are centralized in the registry, as opposed to caches which are spread around the runners. so i only have to manage one beefy disk instead of multiple ones, which is one of the things that makes me hesitant in growing the runner right now (if i grow that runner, people now expect runners to be big).
the other is that, quite frankly, it's harder for you people to fill up the container registry than the caches, because it's more of a pain in the back to build container images than just write to cache. :p granted, that's kind of a BOFH move, but it's still a reality.
(i was originally worried about opening the registry exactly for the reverse reason, but since now people have bled over the cache and i grew the disk on the gitlab server, i'm worried about the opposite.) :)
not sure what you mean there, but surely we could replicate that system with a multi-layered Docker image, e.g. the Shadow image would be build FROM tor?
I was thinking everything after the first "miss"/change would need to be rebuilt, but I think that can be avoided using a multi-image build, at the cost of more docker complexity....
as opposed to caches which are spread around the runners
the other is that, quite frankly, it's harder for you people to fill up the container registry than the caches, because it's more of a pain in the back to build container images than just write to cache.
I mean, yeah, this doesn't seem like a very compelling reason. Assuming the caches are actually useful, this means either more engineering/maintenance effort to switch things over to Docker images and end up using roughly the same amount of storage again, or lose the benefit of caching.
Yeah, that requires me setting up an S3 cluster. :)
the other is that, quite frankly, it's harder for you people to fill up the container registry than the caches, because it's more of a pain in the back to build container images than just write to cache.
I mean, yeah, this doesn't seem like a very compelling reason. Assuming the caches are actually useful, this means either more engineering/maintenance effort to switch things over to Docker images and end up using roughly the same amount of storage again, or lose the benefit of caching.
The thing with docker images is that they somewhat force you in a better
workflow. And the actually provide a useful feature on their
own. e.g. i think we should definitely have an official, solid, and
constantly update image for at least tor, but also arti, etc...
In other words, having those docker images would benefit more than just
the sysadmins complaining about caches, IMHO. :)
...
On 2021-11-01 19:22:23, Jim Newsome (@jnewsome) wrote:
--
Antoine Beaupré
torproject.org system administration
... and that's just the top 10. I haven't yet made a thing that digests all of those caches and spits out the per project disk usage (but it's on my roadmap :p).
i looked at just doubling the disk using the local SAS drives on that VM, and that can't be done: the node is full.
root@chi-node-01:~# gnt-instance grow-disk ci-runner-01.torproject.org 2 100GFailure: prerequisites not met for this operation:error type: insufficient_resources, error details:Not enough disk space on target node chi-node-03.torproject.org vg vg_ganeti: required 102400 MiB, available 50656 MiB
now i could start juggling VMs around to free up some space, but there's a nice little nugget i could kill before that, shadow-01: #40498 (closed). this is going to take a little while because our retirement process enforces some delays, but it should give us some breathing room soon-ish.
root@chi-node-01:~# gnt-instance grow-disk ci-runner-01.torproject.org 2 100GMon Nov 1 21:02:17 2021 Growing disk 2 of instance 'ci-runner-01.torproject.org' by 100.0G to 200.0GMon Nov 1 21:02:20 2021 - INFO: Waiting for instance ci-runner-01.torproject.org to sync disksMon Nov 1 21:02:20 2021 - INFO: - device disk/2: 0.10% done, 2h 20m 55s remaining (estimated)Mon Nov 1 21:03:20 2021 - INFO: - device disk/2: 2.10% done, 56m 19s remaining (estimated)
i still need to resize the underlying filesystem, and i'm heading out so that might not actually be finished before tomorrow, but it should give us some breathing room.
As several of the lingering docker volumes appear to contain stuff from /builds, we should probably take note of the following recommendation from GitLab:
GitLab Runner does not stop you from storing things inside of the Builds Directory. For example, you can store tools inside of /builds/tools that can be used during CI execution. We HIGHLY discourage this, you should never store anything inside of the Builds Directory. GitLab Runner should have total control over it and does not provide stability in such cases. If you have dependencies that are required for your CI, we recommend installing them in some other place.
how do we act on this though? in other words, is it that jobs explicitly
store stuff in there, or do we need to explicitly opt out?
...
On 2021-11-01 21:11:21, Jérôme Charaoui (@lavamind) wrote:
Jérôme Charaoui commented:
As several of the lingering docker volumes appear to contain stuff from /builds, we should probably take note of the following recommendation from GitLab:
GitLab Runner does not stop you from storing things inside of the Builds Directory. For example, you can store tools inside of /builds/tools that can be used during CI execution. We HIGHLY discourage this, you should never store anything inside of the Builds Directory. GitLab Runner should have total control over it and does not provide stability in such cases. If you have dependencies that are required for your CI, we recommend installing them in some other place.