prevent Puppet from restarting docker (and gitlab-runner?)

I had a job die mysteriously this morning:

https://gitlab.torproject.org/jnewsome/sponsor-61-sims/-/jobs/39943#L7771

ERROR: Job failed (system failure): aborted: terminated

And at the top of the page:

There has been a runner system failure, please try again

@anarcat mentioned this might have been related to @lavamind doing some puppet work, triggering a restart of gitlab-runner or docker.

If possible could we confirm this is what happened? Is there some safeguard we could put in place to prevent such restarts while a job is running? I feel like this might be another pain point of shoe-horning shadow sims into CI jobs - for most CI jobs it's probably no big deal to get killed and have to restart, but in this case we lost 20h of computation.

Edited Dec 06, 2021 by anarcat
Assignee Loading
Time tracking Loading