@@ -505,13 +505,12 @@ regular releases. Our consultant is part of the core team.
## Monitoring and metrics
As other TPA servers, the CRM servers are monitored by
[Nagios](howto/nagios). The Redis server (and the related IPsec tunnel) is
particularly monitored by Nagios, using a special `PING` check, to
make sure both ends can talk to each other.
[Prometheus](service/prometheus). The Redis server (and the related IPsec tunnel) is
particularly monitored, using a `blackbox` check, to make sure both
ends can talk to each other.
There's also [Prometheus](service/prometheus) monitoring with graphs rendered by
[Grafana](howto/grafana). This includes an elaborate [Postfix dashboard](https://grafana.torproject.org/d/Ds5BxBYGk/postfix-mtail?orgId=1&from=now-24h&to=now&var-node=eugeni.torproject.org&var-node=crm-int-01.torproject.org)
watching to two mail servers.
There's also graphs rendered by [Grafana](howto/grafana). This includes an
elaborate [Postfix dashboard](https://grafana.torproject.org/d/Ds5BxBYGk/postfix-mtail?orgId=1&from=now-24h&to=now&var-node=eugeni.torproject.org&var-node=crm-int-01.torproject.org) watching to two mail servers.
We started working on [monitoring the CiviCRM health better](https://gitlab.torproject.org/tpo/web/civicrm/-/issues/78). So
The [donate neo donations](https://grafana.torproject.org/d/f36842c2-af41-48c2-ab71-442307ba2f75/donate-neo-donations) dashboard is the main view of the
service in Grafana. It shows the state of the CiviCRM kill switch,
transaction rates, errors, the rate limiter, and exception counts. It
@@ -838,8 +835,6 @@ draw correlations if there are issues with the service.
There are also links, on the top-right, to Django-specific dashboards
that can be used to diagnose performance issues.
See [tpo/web/donate-neo#75](https://gitlab.torproject.org/tpo/web/donate-neo/-/issues/75) for followup on missing metrics.
Also note that the CiviCRM side of things has its own metrics, see the
[CiviCRM monitoring and metrics documentation](service/crm#monitoring-and-metrics).