gitlab is failing to deliver mail notifications and other issues (queuing problems?)

it looks like queues are not processing.

this problem has other symptoms, possibly non exhaustive list of broken things:

  • mail notifications
  • IRC notifications
  • scheduled ci jobs (e.g. https://gitlab.torproject.org/tpo/tpa/triage-ops/-/pipelines)
  • merging merge requests

Steps to reproduce

comment on an issue

What is the current bug behavior?

mail notifications are not sent.

What is the expected correct behavior?

mail notifications should be sent.

When did this start?

unclear. first reported at 10UTC on #tor-admin by @gk

@ahf mentioned the scheduled CI jobs:

09:00:54 <ahf> died somewhere within 5-15 min. after Jun 25, 2025, 09:27 GMT+2 (07:27 UTC) today

Relevant logs and/or screenshots

nothing in the sidekiq or mailroom logs, followed the "outgoing mail" pager playbook and gitlab-console is able to send mail fine, so this is a queuing issue.

Possible fixes

there's a DiskWillFillSoon alert in karma that could be the root cause here, @lelutin is looking into that.

Next steps

  • updating status.tpo
  • remove old volume group on gitlab-02
  • remove extra disk in ganeti
  • rename new volume group to remove hdd suffix (it's on SSD like everything else)
  • regroup all LVs into one / stop using VGs? (possibly better to delegate to a later "rebuild gitlab-02", see #42218 (comment 3217801)) not worth it
  • monitoring needed for sidekiq (and dashboards?)
  • timeline
  • monitoring for disk space? chore-level #42218 (comment 3217901)
  • root cause analysis, see discussion in #42218 (comment 3217883), likely limitation in sidekiq fixed by restarting it
Edited Jun 25, 2025 by anarcat
Assignee Loading
Time tracking Loading