The images that are being built are built exclusively with amd64 architecture. This means when you try to use any executable in the built image on a non-amd64 architecture, you will get an error, "Exec format error".
In order to fix this, multi-arch builds have to be done for the containers. There are a few ways to accomplish this: Podman has buildx which can be used to do this, but its also possible to pass the right variables to a podman build to build each required architecture, and then compile the manifest together.
The problem is that we aren't simply doing a 'podman build' - we are running a shell script that runs mmdebstrap which then pipes the resulting output to podman import. Fortunately, mmdebstrap has the --architecture option, but in order to run a shell script in beginning, we have to use a multi-arch enabled image to run the shell executable on that architecture. I do not think it is sufficient to simply pass all the architectures to mmdebstrap
We'll either need to figure out how to build each architecture separately and then assemble the manifests, or cross-compile all of them at once.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
We do have access to non-x86 builders, so that's one first thing we
could do, e.g. build arm64 images. For i386 (32 bit) we might be able to
get away with multiarch...
This issue is unlabelled after 5 days. It needs
attention. Please take care of this before the end of
2023-11-12, otherwise it will
be moved to the Icebox.
To make the bot ignore this ticket, add the bot-ignore label.
@micah we typically use bot-ignore for Needs Information tickets that we know won't get info for two weeks... this seems more like an actual Icebox case. otherwise, feel free to triage back into Backlog or Next if you feel it's higher priority... thanks and sorry for the noise!
trying to parse what you mean here... i do think that this issue won't get information for a couple weeks, which suggests it should be Needs Information and bot-ignore, but you said that this seems like an Icebox case, but I'm not sure the differentiation there.
i mark issues with Needs Information when I (or you, in this case) am in need of extra information from the ticket submitter to do my work.
Here we're not waiting for any extra information: we need to do some investigative work to figure out the multi-arch stuff. It's something that just needs to be done, and when it's done, then the ticket is closed and we move on merrily to the next ticket.
In other words, the difference between a Needs Information ticket and a Icebox ticket is the Icebox ticket won't harass the submitter for more information, ever, which I think is fine in this context.
In this case, you were pinged by the bot not because it was a Needs Information ticket, but because it wasn't labeled at all.
Had a quick stab at this for fun. I believe we can build foreign images with mmdebstrap in an unprivileged rootless container but this requires two things:
qemu-user-static and binfmt-support packages installed
the host's /proc/sys/fs/binfmt_misc mounted as a volume inside the container
Currently we don't have any runners that satisfies 2. but we should look into it in the context of team#41044
Lately, I started thinking about switching some of our CI jobs to containers.torproject.org/tpo/tpa/base-images/debian:stable. I think depending on each particular job's tags, this could lead to failures in the following scenario:
Job is tagged with a tag shared by tpa and osuosl runners, eg. docker or kvm
Job is not tagged tpa or amd64
In this context, because the job is not untagged, our non-amd64 runners at osuosl would be candidates to pick up this job, and when that happens, the job would fail with an Exec format error.
To avoid this we would either need to:
Build and host images for all available runner architectures
Ensure images using these base images remain untagged, or tagged with tpa and/or amd64
Keep only architecture-indicating tags on the osuosl runners, and remove all others
Perhaps this would be a good excuse to just get started already with tiered runners. Create ci-runner-x86-01 in Ganeti as a smallish runner but only registered to key TPA projects. We could then mount the required volumes or even enable privileged mode, whatever's needed to build these multi-arch images.
Another reason to want this is for when we want a job to run on an image built for a specific arch; vs I think the above is talking about jobs that could hypothetically run on any available arch.
For such jobs I think it'd be preferable to have completely separate images instead of using multi-arch images, since podman caching permits substituting different archs from a multi-arch image, potentially resulting in running on an i386 image when you expected amd64 or vice versa. See team#41621
e.g. it'd be nice if we followed something like the dockerhub convention of having arch-specific images with the arch in the path, like containers.torproject.org/tpo/tpa/base-images/<arch>/debian:stable
LMK if I should file a separate issue, since I'm specifically not asking for multi-arch images. I figured there was enough overlap though in wanting supported images for other archs that I'd start by adding here.
For such jobs I think it'd be preferable to have completely separate images instead of using multi-arch images, since podman caching permits substituting different archs from a multi-arch image, potentially resulting in running on an i386 image when you expected amd64 or vice versa. See team#41621
is i386 really that much of a thing anymore? even OS like Debian are considering dropping it entirely as a supported architecture...
looking at aarch64 (ARM), i don't think this applies, because you can't run aarch64 on amd64 and vice versa, so it's not like podman is just going to pick the wrong image there: you'll need to target the right runner...
is i386 really that much of a thing anymore? even OS like Debian are considering dropping it entirely as a supported architecture...
We currently test c-tor on i386, though I'm not opposed to revisiting that.
looking at aarch64 (ARM), i don't think this applies, because you can't run aarch64 on amd64 and vice versa, so it's not like podman is just going to pick the wrong image there: you'll need to target the right runner...
In general we might not run into the multi-arch caching footguns given the set of platforms we currently care about, but it'd be nice if we could just avoid using multi-arch images altogether until if and when it's a bit less footgun-ful.
i wonder if it's related to this issue, but since the bookworm upgrade of the Ganeti cluster, our capacity at running i386 images has improved (see #41656).
obviously, it doesn't provide us with builds for all architectures that we currently support (and we have quite a few runners over different architectures! according to our docs, we have 5!
i'm actually skeptical that we actually do have a real i386 server, as that platform is actually getting dropped from supported platforms in lots of places...
anyways, my point is we have at least 4 hardware platforms running CI runners right now. three of those are OSUOSL runners, which, fair, maybe we don't trust with building our images, but those are a thing!
i think that images built on those runners would naturally inherit the architecture of the host, and could be tagged accordingly. i'm not sure the current build harness here covers for this, but i figured i would mention this.
so i guess my point is i am not sure i would wonder cross-building images, i would just build them on machines that natively supports that architecture.
i will also point out that since we enabled building more versions of our container images, each pipeline now runs a whopping 22 jobs, including 16 "other" containers, one for each supported debian release (bullseye, bookworm, trixie and sid) and one for each image type (golang, python, podman, redis-server). adding architecture to that matrix might make things slightly unwieldy, as we'd end up with 64 jobs on each build.
(we might be able to save up on that by taking out, say, podman and redis-server since i'm not sure those need to be built for all debian releases... we should also probably remove bullseye from the base images eventually, see #19 (closed).)