Multi-arch images

We do have access to non-x86 builders, so that's one first thing we could do, e.g. build arm64 images. For i386 (32 bit) we might be able to get away with multiarch...

This issue is unlabelled after 5 days. It needs attention. Please take care of this before the end of 2023-11-12, otherwise it will be moved to the Icebox.

To make the bot ignore this ticket, add the bot-ignore label.

added Stale label

removed Stale label

added 1 deleted label

added Icebox label and removed 1 deleted label

@micah we typically use bot-ignore for Needs Information tickets that we know won't get info for two weeks... this seems more like an actual Icebox case. otherwise, feel free to triage back into Backlog or Next if you feel it's higher priority... thanks and sorry for the noise!

trying to parse what you mean here... i do think that this issue won't get information for a couple weeks, which suggests it should be Needs Information and bot-ignore, but you said that this seems like an Icebox case, but I'm not sure the differentiation there.

i mark issues with Needs Information when I (or you, in this case) am in need of extra information from the ticket submitter to do my work.

Here we're not waiting for any extra information: we need to do some investigative work to figure out the multi-arch stuff. It's something that just needs to be done, and when it's done, then the ticket is closed and we move on merrily to the next ticket.

In other words, the difference between a Needs Information ticket and a Icebox ticket is the Icebox ticket won't harass the submitter for more information, ever, which I think is fine in this context.

In this case, you were pinged by the bot not because it was a Needs Information ticket, but because it wasn't labeled at all.

thank you for the explanation.

i was confused because we need more information to do the work, but I get the difference now.

Had a quick stab at this for fun. I believe we can build foreign images with mmdebstrap in an unprivileged rootless container but this requires two things:

qemu-user-static and binfmt-support packages installed
the host's /proc/sys/fs/binfmt_misc mounted as a volume inside the container

Currently we don't have any runners that satisfies 2. but we should look into it in the context of team#41044

i wonder if this could help with the i386 issues from team#41242 (closed)...

also:

the host's /proc/sys/fs/binfmt_misc mounted as a volume inside the container

Currently we don't have any runners that satisfies 2. but we should look into it in the context of team#41044

is /proc/sys/fs/binfmt_misc a security issue if mounted inside the container?

i wonder if this could help with the i386 issues from team#41242 (closed)...

Unclear, but as far as I know, amd64 should be able to run i386 stuff without any special binfmt stuff.

is /proc/sys/fs/binfmt_misc a security issue if mounted inside the container?

I don't know.

Lately, I started thinking about switching some of our CI jobs to containers.torproject.org/tpo/tpa/base-images/debian:stable. I think depending on each particular job's tags, this could lead to failures in the following scenario:

Job is tagged with a tag shared by tpa and osuosl runners, eg. docker or kvm
Job is not tagged tpa or amd64

In this context, because the job is not untagged, our non-amd64 runners at osuosl would be candidates to pick up this job, and when that happens, the job would fail with an Exec format error.

To avoid this we would either need to:

Build and host images for all available runner architectures
Ensure images using these base images remain untagged, or tagged with tpa and/or amd64
Keep only architecture-indicating tags on the osuosl runners, and remove all others

Perhaps this would be a good excuse to just get started already with tiered runners. Create ci-runner-x86-01 in Ganeti as a smallish runner but only registered to key TPA projects. We could then mount the required volumes or even enable privileged mode, whatever's needed to build these multi-arch images.

assigned to @lavamind

mentioned in issue team#41621

mentioned in issue Diziet/rust-derive-deftly#97

Another reason to want this is for when we want a job to run on an image built for a specific arch; vs I think the above is talking about jobs that could hypothetically run on any available arch.

For such jobs I think it'd be preferable to have completely separate images instead of using multi-arch images, since podman caching permits substituting different archs from a multi-arch image, potentially resulting in running on an i386 image when you expected amd64 or vice versa. See team#41621

e.g. it'd be nice if we followed something like the dockerhub convention of having arch-specific images with the arch in the path, like containers.torproject.org/tpo/tpa/base-images/<arch>/debian:stable

LMK if I should file a separate issue, since I'm specifically not asking for multi-arch images. I figured there was enough overlap though in wanting supported images for other archs that I'd start by adding here.

For such jobs I think it'd be preferable to have completely separate images instead of using multi-arch images, since podman caching permits substituting different archs from a multi-arch image, potentially resulting in running on an i386 image when you expected amd64 or vice versa. See team#41621

is i386 really that much of a thing anymore? even OS like Debian are considering dropping it entirely as a supported architecture...

looking at aarch64 (ARM), i don't think this applies, because you can't run aarch64 on amd64 and vice versa, so it's not like podman is just going to pick the wrong image there: you'll need to target the right runner...

is i386 really that much of a thing anymore? even OS like Debian are considering dropping it entirely as a supported architecture...

We currently test c-tor on i386, though I'm not opposed to revisiting that.

looking at aarch64 (ARM), i don't think this applies, because you can't run aarch64 on amd64 and vice versa, so it's not like podman is just going to pick the wrong image there: you'll need to target the right runner...

Might we not run into the same issue on arm though? e.g. arm32v5, arm32v6, arm32v7, arm64v8 https://github.com/docker-library/official-images#architectures-other-than-amd64 . Maybe today we only care about arm64v8, but what happens when tomorrow there's an arm64v9?

In general we might not run into the multi-arch caching footguns given the set of platforms we currently care about, but it'd be nice if we could just avoid using multi-arch images altogether until if and when it's a bit less footgun-ful.

i wonder if it's related to this issue, but since the bookworm upgrade of the Ganeti cluster, our capacity at running i386 images has improved (see #41656).

obviously, it doesn't provide us with builds for all architectures that we currently support (and we have quite a few runners over different architectures! according to our docs, we have 5!

amd64: popular 64-bit Intel/AMD architecture (equivalents: x86_64 and x86-64)

aarch64: the 64-bit ARM extension (equivalents: arm64 and arm64-v8a)

i386: 32-bit Intel/AMD architecture (equivalents: x86)

ppc64le: IBM Power architecture

s390x: Linux on IBM Z architecture

i'm actually skeptical that we actually do have a real i386 server, as that platform is actually getting dropped from supported platforms in lots of places...

anyways, my point is we have at least 4 hardware platforms running CI runners right now. three of those are OSUOSL runners, which, fair, maybe we don't trust with building our images, but those are a thing!

i think that images built on those runners would naturally inherit the architecture of the host, and could be tagged accordingly. i'm not sure the current build harness here covers for this, but i figured i would mention this.

so i guess my point is i am not sure i would wonder cross-building images, i would just build them on machines that natively supports that architecture.

i will also point out that since we enabled building more versions of our container images, each pipeline now runs a whopping 22 jobs, including 16 "other" containers, one for each supported debian release (bullseye, bookworm, trixie and sid) and one for each image type (golang, python, podman, redis-server). adding architecture to that matrix might make things slightly unwieldy, as we'd end up with 64 jobs on each build.

(we might be able to save up on that by taking out, say, podman and redis-server since i'm not sure those need to be built for all debian releases... we should also probably remove bullseye from the base images eventually, see #19 (closed).)

some docs i found on how to build multi-arch containers, which @lavamind is now looking into because of team#42052 (closed)

it seems like while there's a convention to use arch prefixes in image names, the recommended way is to actually use the --platform flag during build.

we're experimenting.

added Backlog label and removed Icebox label

added CI Doing labels and removed Backlog label

Multi-arch images

Child items ...

Activity