DjangoExceptions alerts mysteriously failed to send notifications

Before the TypeError exceptions (tpo/web/donate-neo#122 (closed)) were fixed (!53 (closed)), we were getting repeated alerts that were firing, but not being sent on IRC.

Here is a graph of the pending and firing alerts for the DjangoException rule in the 24h before the TypeError fix was deployed :

image

There you can just plain see the alert is in the "firing state". Yet on the #tor-alerts IRC channel, things were utterly silent.

I don't quite understand why those alerts didn't fire (and flap!) like crazy. The Grafana panel above clearly shows the alerts firing, and sometimes for 5 minutes, that, in my understanding of Prometheus alerting rules, should have triggered a notification. Take this range for example:

image

(from 2024-09-18 09:14:55 to 2024-09-18 10:36:30)

there you can see a case where an alert was firing for 5 minutes straight, yet we've never seen anything about this in the IRC channel.

looking at the actual exception counter rate:

image https://grafana.torproject.org/d/f36842c2-af41-48c2-ab71-442307ba2f75/donate-neo-donations?orgId=1&from=1726650895000&to=1726655790000&viewPanel=24

... is interesting, because it shows that we almost never got a solid 10 minutes of "one exception per minute" increase. It always petered out after a bit.

That was my main motivation between !53 (closed): to test the theory that for: is the thing that is keeping notifications from being sent. But I don't quite buy it: for: is what makes the difference between a pending and firing alert, and the alerts are clearly firing.

So something else might be up here.

In any case, worth investigating further.