add prometheus alerting for donate-neo

in #72 (closed), we hook up donate-neo in prometheus, to start scraping metrics, but we also want alerting (and a dashboard) to follow progress.

remaining work:

  • graph and alert on failed transactions, requires a fix for #116 (closed)
  • graph and alert on the "kill switch", followup for https://gitlab.torproject.org/tpo/web/civicrm/-/issues/78
  • fix alerting on exceptions (not sensitive enough, didn't detect #122 (closed))
  • optionally, alert on stale jobs (already done)
  • optionally, cover for new metrics from https://gitlab.torproject.org/tpo/web/civicrm/-/issues/148 followup in that issue
  • document all of this in service/crm and service/donate in the wiki
Edited Oct 08, 2024 by anarcat
Assignee Loading
Time tracking Loading