Gitlab runner fails to resolve gitlab.torproject.org
- Show closed items
Activity
-
Newest first Oldest first
-
Show all activity Show comments only Show history only
- Maintainer
Right after GeKo told me about this on IRC, I noticed it had happened in https://gitlab.torproject.org/tpo/core/tor/-/jobs/10528 too.
- Maintainer
Mentioned on IRC, this might be the issue we see here: https://gitlab.com/gitlab-org/gitlab-runner/-/issues/6644
rebooting seems to have fixed the issue.
looking at this comment in the issue ahf found indicates this might be a bridge mapping issue. The magic command (
docker inspect --format='{{.NetworkSettings.Networks}}' $CONTAINER_ID
) doesn't give us anything interesting:map[bridge:0xc0005ec000]
A fuller output looks like:
root@ci-runner-01:~# docker inspect runner-9avwsm6s-project-321-concurrent-3-5dd950797d8d7760-predefined-2 [ { "Id": "e8c1d0406421c8623f71e310d30b096fdfe71f5bb09a7157891a857ef6e47ab6", "Created": "2021-02-04T15:17:22.659419093Z", "Path": "/usr/bin/dumb-init", "Args": [ "/entrypoint", "gitlab-runner-build" ], "State": { "Status": "exited", "Running": false, "Paused": false, "Restarting": false, "OOMKilled": false, "Dead": false, "Pid": 0, "ExitCode": 0, "Error": "", "StartedAt": "2021-02-04T15:17:23.583299634Z", "FinishedAt": "2021-02-04T15:17:23.767182637Z" }, "Image": "sha256:c398d3217fca9b237cc9946289c2790831d109248aea38723aeb5ee6da0f13a5", "ResolvConfPath": "/var/lib/docker/containers/e8c1d0406421c8623f71e310d30b096fdfe71f5bb09a7157891a857ef6e47ab6/resolv.conf", "HostnamePath": "/var/lib/docker/containers/e8c1d0406421c8623f71e310d30b096fdfe71f5bb09a7157891a857ef6e47ab6/hostname", "HostsPath": "/var/lib/docker/containers/e8c1d0406421c8623f71e310d30b096fdfe71f5bb09a7157891a857ef6e47ab6/hosts", "LogPath": "/var/lib/docker/containers/e8c1d0406421c8623f71e310d30b096fdfe71f5bb09a7157891a857ef6e47ab6/e8c1d0406421c8623f71e310d30b096fdfe71f5bb09a7157891a857ef6e47ab6-json.log", "Name": "/runner-9avwsm6s-project-321-concurrent-3-5dd950797d8d7760-predefined-2", "RestartCount": 0, "Driver": "overlay2", "Platform": "linux", "MountLabel": "", "ProcessLabel": "", "AppArmorProfile": "docker-default", "ExecIDs": null, "HostConfig": { "Binds": [ "runner-9avwsm6s-project-321-concurrent-3-cache-3c3f060a0374fc8bc39395164f415a70:/cache", "runner-9avwsm6s-project-321-concurrent-3-cache-c33bcaa1fd2c77edfc3893b41966cea8:/builds" ], "ContainerIDFile": "", "LogConfig": { "Type": "json-file", "Config": {} }, "NetworkMode": "default", "PortBindings": null, "RestartPolicy": { "Name": "no", "MaximumRetryCount": 0 }, "AutoRemove": false, "VolumeDriver": "", "VolumesFrom": null, "CapAdd": null, "CapDrop": null, "CgroupnsMode": "host", "Dns": null, "DnsOptions": null, "DnsSearch": null, "ExtraHosts": null, "GroupAdd": null, "IpcMode": "shareable", "Cgroup": "", "Links": null, "OomScoreAdj": 0, "PidMode": "", "Privileged": false, "PublishAllPorts": false, "ReadonlyRootfs": false, "SecurityOpt": null, "UTSMode": "", "UsernsMode": "", "ShmSize": 67108864, "Runtime": "runc", "ConsoleSize": [ 0, 0 ], "Isolation": "", "CpuShares": 0, "Memory": 0, "NanoCpus": 0, "CgroupParent": "", "BlkioWeight": 0, "BlkioWeightDevice": null, "BlkioDeviceReadBps": null, "BlkioDeviceWriteBps": null, "BlkioDeviceReadIOps": null, "BlkioDeviceWriteIOps": null, "CpuPeriod": 0, "CpuQuota": 0, "CpuRealtimePeriod": 0, "CpuRealtimeRuntime": 0, "CpusetCpus": "", "CpusetMems": "", "Devices": null, "DeviceCgroupRules": null, "DeviceRequests": null, "KernelMemory": 0, "KernelMemoryTCP": 0, "MemoryReservation": 0, "MemorySwap": 0, "MemorySwappiness": null, "OomKillDisable": false, "PidsLimit": null, "Ulimits": null, "CpuCount": 0, "CpuPercent": 0, "IOMaximumIOps": 0, "IOMaximumBandwidth": 0, "MaskedPaths": [ "/proc/asound", "/proc/acpi", "/proc/kcore", "/proc/keys", "/proc/latency_stats", "/proc/timer_list", "/proc/timer_stats", "/proc/sched_debug", "/proc/scsi", "/sys/firmware" ], "ReadonlyPaths": [ "/proc/bus", "/proc/fs", "/proc/irq", "/proc/sys", "/proc/sysrq-trigger" ] }, "GraphDriver": { "Data": { "LowerDir": "/var/lib/docker/overlay2/93054eff44c85c770103885eed501dc04248359bbfa842dfc84eebd6f4094ce1-init/diff:/var/lib/docker/overlay2/ef1b70c57e4f97533927651274889a59f6043f1a104800bc5045118411a41db8/diff", "MergedDir": "/var/lib/docker/overlay2/93054eff44c85c770103885eed501dc04248359bbfa842dfc84eebd6f4094ce1/merged", "UpperDir": "/var/lib/docker/overlay2/93054eff44c85c770103885eed501dc04248359bbfa842dfc84eebd6f4094ce1/diff", "WorkDir": "/var/lib/docker/overlay2/93054eff44c85c770103885eed501dc04248359bbfa842dfc84eebd6f4094ce1/work" }, "Name": "overlay2" }, "Mounts": [ { "Type": "volume", "Name": "runner-9avwsm6s-project-321-concurrent-3-cache-3c3f060a0374fc8bc39395164f415a70", "Source": "/var/lib/docker/volumes/runner-9avwsm6s-project-321-concurrent-3-cache-3c3f060a0374fc8bc39395164f415a70/_data", "Destination": "/cache", "Driver": "local", "Mode": "z", "RW": true, "Propagation": "" }, { "Type": "volume", "Name": "runner-9avwsm6s-project-321-concurrent-3-cache-c33bcaa1fd2c77edfc3893b41966cea8", "Source": "/var/lib/docker/volumes/runner-9avwsm6s-project-321-concurrent-3-cache-c33bcaa1fd2c77edfc3893b41966cea8/_data", "Destination": "/builds", "Driver": "local", "Mode": "z", "RW": true, "Propagation": "" } ], "Config": { "Hostname": "runner-9avwsm6s-project-321-concurrent-3", "Domainname": "", "User": "", "AttachStdin": true, "AttachStdout": true, "AttachStderr": true, "Tty": false, "OpenStdin": true, "StdinOnce": true, "Env": [ ENV LIST WITH SECRETS REDACTED ], "Cmd": [ "gitlab-runner-build" ], "Image": "sha256:c398d3217fca9b237cc9946289c2790831d109248aea38723aeb5ee6da0f13a5", "Volumes": null, "WorkingDir": "", "Entrypoint": [ "/usr/bin/dumb-init", "/entrypoint" ], "OnBuild": null, "Labels": { "com.gitlab.gitlab-runner.job.before_sha": "7430b4ef9f4b0371502560126b5342dc4f117371", "com.gitlab.gitlab-runner.job.id": "10797", "com.gitlab.gitlab-runner.job.ref": "maint-1.1", "com.gitlab.gitlab-runner.job.sha": "beaf6de889bc75d53a6b0b90d12ab85aa0db56a0", "com.gitlab.gitlab-runner.pipeline.id": "2507", "com.gitlab.gitlab-runner.project.id": "321", "com.gitlab.gitlab-runner.runner.id": "9avWSM6S", "com.gitlab.gitlab-runner.runner.local_id": "0", "com.gitlab.gitlab-runner.type": "predefined" } }, "NetworkSettings": { "Bridge": "", "SandboxID": "[LONG HEX HASH REDACTED]", "HairpinMode": false, "LinkLocalIPv6Address": "", "LinkLocalIPv6PrefixLen": 0, "Ports": {}, "SandboxKey": "/var/run/docker/netns/[SHORT HEX HASH REDACTED]", "SecondaryIPAddresses": null, "SecondaryIPv6Addresses": null, "EndpointID": "", "Gateway": "", "GlobalIPv6Address": "", "GlobalIPv6PrefixLen": 0, "IPAddress": "", "IPPrefixLen": 0, "IPv6Gateway": "", "MacAddress": "", "Networks": { "bridge": { "IPAMConfig": null, "Links": null, "Aliases": null, "NetworkID": "[ANOTHER LONG HEX HASH REDACTED]", "EndpointID": "", "Gateway": "", "IPAddress": "", "IPPrefixLen": 0, "IPv6Gateway": "", "GlobalIPv6Address": "", "GlobalIPv6PrefixLen": 0, "MacAddress": "", "DriverOpts": null } } } } ]
So I don't think that's an easy fix for us, unfortunately. but we'll see: next time this happens, maybe this output will be different enough to figure out what is wrong.
"how hard can networking be" right? :)
in the meantime, i'll close this, please do reopen this ticket (or a new one) when/if it happens again, sorry for the inconvenience, and thanks for flying TPA! :)
- anarcat closed
closed
- anarcat reopened
reopened
there was no upgrade or reboot recently, latest upgrade is:
root@ci-runner-01:~# grep docker /var/log/dpkg.log* /var/log/dpkg.log:2021-02-03 21:55:24 upgrade docker.io:amd64 18.09.1+dfsg1-7.1+deb10u2 20.10.2+dfsg1-2
root@ci-runner-01:~# uptime 15:36:47 up 5 days, 26 min, 1 user, load average: 0.06, 0.25, 0.25
Those dates match when this problem manifested itself the last time.
I can confirm that networking is completely down inside the container. I confirmed with a Python image doing a simple
socket
connect:>>> import socket >>> sock = socket.socket() >>> sock.connect(('206.248.172.91', 80)) [hangs] ^CTraceback (most recent call last): File "<stdin>", line 1, in <module> KeyboardInterrupt
i restarted docker and it fixed the problem, but it's kind of annoying that this comes up like this.
docker had ran out of disk space today (#95 (closed)), maybe that was the cause?
closing until this happens again...
- anarcat closed
closed
- anarcat mentioned in commit wiki-replica@78906ab9
mentioned in commit wiki-replica@78906ab9
this happened again. restarting docker fixed it, but i have no idea what triggered it this time. we didn't run out of disk space, and there were no upgrades since feb 15:
root@ci-runner-01:~# grep upgrade /var/log/dpkg.log | tail -10 2021-02-19 06:40:08 upgrade libdns1104:amd64 1:9.11.5.P4+dfsg-5.1+deb10u2 1:9.11.5.P4+dfsg-5.1+deb10u3 2021-02-19 06:40:08 upgrade libisc1100:amd64 1:9.11.5.P4+dfsg-5.1+deb10u2 1:9.11.5.P4+dfsg-5.1+deb10u3 2021-02-19 06:40:08 upgrade liblwres161:amd64 1:9.11.5.P4+dfsg-5.1+deb10u2 1:9.11.5.P4+dfsg-5.1+deb10u3 2021-02-19 06:40:08 upgrade libisc-export1100:amd64 1:9.11.5.P4+dfsg-5.1+deb10u2 1:9.11.5.P4+dfsg-5.1+deb10u3 2021-02-19 06:40:09 upgrade libdns-export1104:amd64 1:9.11.5.P4+dfsg-5.1+deb10u2 1:9.11.5.P4+dfsg-5.1+deb10u3 2021-02-21 06:45:04 upgrade libzstd1:amd64 1.3.8+dfsg-3+deb10u1 1.3.8+dfsg-3+deb10u2 2021-02-21 06:45:04 upgrade ldap-utils:amd64 2.4.47+dfsg-3+deb10u5 2.4.47+dfsg-3+deb10u6 2021-02-21 06:45:04 upgrade libldap-common:all 2.4.47+dfsg-3+deb10u5 2.4.47+dfsg-3+deb10u6 2021-02-21 06:45:04 upgrade libldap-2.4-2:amd64 2.4.47+dfsg-3+deb10u5 2.4.47+dfsg-3+deb10u6 2021-02-22 06:40:17 upgrade screen:amd64 4.6.2-3 4.6.2-3+deb10u1
@ahf reported this today, and i doubt he would have tolerated that problem a full week, so this is not an upgrade problem.
Edited by anarcat- anarcat reopened
reopened
- Maintainer
Happened again since the setup was poked at last night (danish time/CET).
Based on my inbox, with the following emails:
N GitLab Failed pipeline for main | Triage Ops | 775dc2b9 2021/03/02 01:03 N GitLab Fixed pipeline for main | Triage Ops | 775dc2b 2021/03/02 02:03 N GitLab Failed pipeline for main | Triage Ops | 775dc2b9 2021/03/02 06:03
It seems like it has happened somewhere after 02:03 (UTC) where anarcat fixed the runner and 06:03 (UTC) where the hourly triage ops project failed again.
It seems like it has happened somewhere after 02:03 (UTC) where anarcat fixed the runner and 06:03 (UTC) where the hourly triage ops project failed again.
and you were saying on IRC that you probably had a working pipeline go through at 05:03 as well, because that pipeline runs hourly, right?
so this interesting thing happened during that period: the firewall rules were reloaded by puppet...
Mar 2 05:32:38 ci-runner-01/ci-runner-01 puppet-agent[29768]: (/Stage[main]/Nagios::Client/Ferm::Rule[roles-nagiosmaster-ssh-hetzner-hel1-01.torproject.org]/File[/etc/ferm/tor.d/00_roles-nagiosmaster-ssh-hetzner-hel1-01.torproject.org]/content) content changed '{md5}2b9f880a29c99666e1c9cd9e9198b93c' to '{md5}241239967a530bc4a65e333f3bb4a78d' Mar 2 05:32:38 ci-runner-01/ci-runner-01 puppet-agent[29768]: (/Stage[main]/Nagios::Client/Ferm::Rule[roles-nagiosmaster-nrpe-hetzner-hel1-01.torproject.org]/File[/etc/ferm/tor.d/00_roles-nagiosmaster-nrpe-hetzner-hel1-01.torproject.org]/content) content changed '{md5}754de66b83324832b8169788843adea4' to '{md5}47052b5099b122f3f9cb0dbd8bffe22d' Mar 2 05:32:38 ci-runner-01/ci-runner-01 systemd[1]: Reloading ferm firewall configuration. Mar 2 05:32:38 ci-runner-01/ci-runner-01 ferm[29958]: Reloading Firewall configuration.... Mar 2 05:32:38 ci-runner-01/ci-runner-01 systemd[1]: Reloaded ferm firewall configuration. Mar 2 05:32:38 ci-runner-01/ci-runner-01 puppet-agent[29768]: (/Stage[main]/Ferm/Exec[ferm reload]) Triggered 'refresh' from 2 events
Maybe that's the cause? And indeed, if i fix the problem (by restarting docker), then reload the firewall rules, the problem comes back!
root@ci-runner-01:~# service docker restart root@ci-runner-01:~# docker run -it --rm debian:stable ping -c 3 torproject.org PING torproject.org (116.202.120.165) 56(84) bytes of data. 64 bytes from web-fsn-01.torproject.org (116.202.120.165): icmp_seq=1 ttl=55 time=124 ms 64 bytes from web-fsn-01.torproject.org (116.202.120.165): icmp_seq=2 ttl=55 time=124 ms 64 bytes from web-fsn-01.torproject.org (116.202.120.165): icmp_seq=3 ttl=55 time=126 ms --- torproject.org ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 5ms rtt min/avg/max/mdev = 123.559/124.411/126.079/1.213 ms root@ci-runner-01:~# service ferm reload root@ci-runner-01:~# docker run -it --rm debian:stable ping -c 3 torproject.org ping: torproject.org: Temporary failure in name resolution root@ci-runner-01:~#
isn't that interesting?? :) absolutely no idea wtf is going on here, but it does seem like a reproducible failure at least!
It seems this is a known problem in Docker too! I'll analyze that thread and see what i can come up with.
There are many workarounds suggested in that discussion:
- restart docker after reloading the firewall rules
- rewrite the firewall rules to avoid flushing the docker rules, probably not practical without severe hacking in iptables/ferm
- another similar hack, called docker-fw
- upstream should have a
docker network reload-firewall
command to avoid restarting the entire daemon (comment, not implemented of course) - run Docker inside its own namespace with systemd-named-netns (comment)
I can confirm that reloading ferm flushes critical firewall rules from Docker:
--- before 2021-03-02 14:29:54.856964851 +0000 +++ after 2021-03-02 14:30:00.737016665 +0000 @@ -101,12 +101,6 @@ Chain FORWARD (policy ACCEPT) target prot opt source destination -DOCKER-USER all -- 0.0.0.0/0 0.0.0.0/0 -DOCKER-ISOLATION-STAGE-1 all -- 0.0.0.0/0 0.0.0.0/0 -ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 ctstate RELATED,ESTABLISHED -DOCKER all -- 0.0.0.0/0 0.0.0.0/0 -ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 -ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 Chain OUTPUT (policy ACCEPT) target prot opt source destination @@ -149,20 +143,3 @@ ACCEPT tcp -- 193.10.5.2 0.0.0.0/0 ACCEPT tcp -- 206.248.172.91 0.0.0.0/0 ACCEPT tcp -- 216.137.119.51 0.0.0.0/0 - -Chain DOCKER (1 references) -target prot opt source destination - -Chain DOCKER-ISOLATION-STAGE-1 (1 references) -target prot opt source destination -DOCKER-ISOLATION-STAGE-2 all -- 0.0.0.0/0 0.0.0.0/0 -RETURN all -- 0.0.0.0/0 0.0.0.0/0 - -Chain DOCKER-ISOLATION-STAGE-2 (1 references) -target prot opt source destination -DROP all -- 0.0.0.0/0 0.0.0.0/0 -RETURN all -- 0.0.0.0/0 0.0.0.0/0 - -Chain DOCKER-USER (1 references) -target prot opt source destination -RETURN all -- 0.0.0.0/0 0.0.0.0/0
maybe it's just a matter of adding those rules ourselves? i'm hesitating between this and adding a systemd override to make sure docker is restarted when ferm is reloaded... rules don't change that often, and the worst effect would be a failed pipeline because of the interrupt...
I'll also point out that a podman-based runner might not be affected by this bug.
seems like the simplest workaround is to re-add this firewall rule from Docker:
iptables -t nat -A POSTROUTING -s 172.17.0.0/16 \! -o docker0 -j MASQUERADE
It's rather strange that the other rules are not used, but at least that works. We could add to our ferm configs to completely fix this issue.
Unfortunately, hooking up docker.service into ferm.service is not possible, because systemd doesn't allow service dependencies on service reloads, only restarts. we'd have to add something likeThat's actually not accurate: reload can trigger a reload: https://www.freedesktop.org/software/systemd/man/systemd.unit.html#PropagatesReloadTo= and restart a restart: https://www.freedesktop.org/software/systemd/man/systemd.unit.html#BindsTo= it still might not work for our case, because docker reload is not sufficient to fix the bug.service docker restart
to theExecReload
command of ferm.service and that's really yucky.Edited by anarcatThanks for working on this.
It seems to have to start happening again: https://gitlab.torproject.org/juga/sbws/-/jobs/13965
to make sure this survives the next firewall reload, I've done this gross override:
root@ci-runner-01:~# systemctl cat ferm # /lib/systemd/system/ferm.service [Unit] Description=ferm firewall configuration RequiresMountsFor=/var/cache/ Wants=network-pre.target Before=network-pre.target shutdown.target Conflicts=shutdown.target DefaultDependencies=no [Service] Type=oneshot RemainAfterExit=yes ExecStart=/etc/init.d/ferm start ExecReload=/etc/init.d/ferm reload ExecStop=/etc/init.d/ferm stop [Install] WantedBy=sysinit.target # /etc/systemd/system/ferm.service.d/override.conf [Service] ExecReload=/etc/init.d/ferm reload ExecReload=service docker restart root@ci-runner-01:~# systemctl show ferm | grep ExecReload ExecReload={ path=/etc/init.d/ferm ; argv[]=/etc/init.d/ferm reload ; ignore_errors=no ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/0 } ExecReload={ path=/etc/init.d/ferm ; argv[]=/etc/init.d/ferm reload ; ignore_errors=no ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/0 } ExecReload={ path=/usr/sbin/service ; argv[]=/usr/sbin/service docker restart ; ignore_errors=no ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/0 } root@ci-runner-01:~# docker run -it --rm debian:stable ping -c 3 torproject.org PING torproject.org (95.216.163.36) 56(84) bytes of data. 64 bytes from hetzner-hel1-03.torproject.org (95.216.163.36): icmp_seq=1 ttl=50 time=121 ms 64 bytes from hetzner-hel1-03.torproject.org (95.216.163.36): icmp_seq=2 ttl=50 time=121 ms 64 bytes from hetzner-hel1-03.torproject.org (95.216.163.36): icmp_seq=3 ttl=50 time=121 ms --- torproject.org ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 5ms rtt min/avg/max/mdev = 120.979/121.098/121.224/0.301 ms root@ci-runner-01:~# systemctl reload ferm root@ci-runner-01:~# docker run -it --rm debian:stable ping -c 3 torproject.org PING torproject.org (116.202.120.166) 56(84) bytes of data. 64 bytes from web-fsn-02.torproject.org (116.202.120.166): icmp_seq=1 ttl=55 time=123 ms 64 bytes from web-fsn-02.torproject.org (116.202.120.166): icmp_seq=2 ttl=55 time=123 ms ^C --- torproject.org ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 3ms rtt min/avg/max/mdev = 123.053/123.070/123.087/0.017 ms ^[[Aroot@ci-runner-01jobs^C root@ci-runner-01:~# systemctl status ferm ● ferm.service - ferm firewall configuration Loaded: loaded (/lib/systemd/system/ferm.service; enabled; vendor preset: enabled) Drop-In: /etc/systemd/system/ferm.service.d └─override.conf Active: active (exited) since Thu 2021-02-04 15:10:19 UTC; 1 months 2 days ago Process: 31253 ExecReload=/etc/init.d/ferm reload (code=exited, status=0/SUCCESS) Process: 31276 ExecReload=/etc/init.d/ferm reload (code=exited, status=0/SUCCESS) Process: 31298 ExecReload=/usr/sbin/service docker restart (code=exited, status=0/SUCCESS) Main PID: 238 (code=exited, status=0/SUCCESS) Mar 02 14:52:23 ci-runner-01 systemd[1]: Reloading ferm firewall configuration. Mar 02 14:52:23 ci-runner-01 ferm[6739]: Reloading Firewall configuration.... Mar 02 14:52:23 ci-runner-01 systemd[1]: Reloaded ferm firewall configuration. Mar 03 10:29:29 ci-runner-01 systemd[1]: Reloading ferm firewall configuration. Mar 03 10:29:29 ci-runner-01 ferm[30223]: Reloading Firewall configuration.... Mar 03 10:29:29 ci-runner-01 systemd[1]: Reloaded ferm firewall configuration. Mar 09 20:52:05 ci-runner-01 systemd[1]: Reloading ferm firewall configuration. Mar 09 20:52:05 ci-runner-01 ferm[31253]: Reloading Firewall configuration.... Mar 09 20:52:06 ci-runner-01 ferm[31276]: Reloading Firewall configuration.... Mar 09 20:52:19 ci-runner-01 systemd[1]: Reloaded ferm firewall configuration.
so yeah, it's kind of gross, because it's a (hidden) service dependency... i'd much rather have the firewall rules properly reloaded on
service docker reload
, so that we could have the "reloads" depend on each other, but alas, this would require a patch to docker and to the service files, so it's not going to happen anytime soon.i'm not super comfortable with adding just the firewall rule either, because i'm not sure it's sufficient to ensure proper operation. ping might work, but maybe other firewall issues could fail in more subtle ways, so i don't really want to play around with this.
the downside with this approach is that it will probably crash any container when ferm reloads. but it beats having all containers fail from there on.
only remaining task is to add this hack to puppet now.
- anarcat closed
closed
apparently this is something specific to the docker image used to run the runner. there's a workaround described here:
https://gitlab.com/gitlab-org/gitlab-runner/-/issues/6644#note_593121647
... and this might even be fixed in the next upstream release, so reopening to track this.
- anarcat reopened
reopened
- anarcat closed
closed
- anarcat added Anti-Censorship label
added Anti-Censorship label
- anarcat removed Anti-Censorship label
removed Anti-Censorship label
due to a bug in the puppet configuration (#109 (closed)), this was only deployed just now.
so this happened again today: https://gitlab.torproject.org/tpo/core/arti/-/jobs/34618
reopening.
- anarcat reopened
reopened
- anarcat closed
closed
- anarcat mentioned in issue team#40368 (closed)
mentioned in issue team#40368 (closed)
restarted docker on shadow in team#40368 (closed) and on the arm builder as well, just in case.
- Jérôme Charaoui marked this issue as related to team#40541 (closed)
marked this issue as related to team#40541 (closed)
- Jérôme Charaoui mentioned in issue team#40541 (closed)
mentioned in issue team#40541 (closed)
for completeness's sake, there was an update to the upstream ticket recently which says it's just a matter of adding a
dns=
entry into the runner config. i doubt it would solve our problem, because it's not just DNS that fails but the entire network stack. it's possible there are multiple tangled up issues here as well.