Verified Commit a3319780 authored by anarcat's avatar anarcat
Browse files

update donate docs: we have metrics now

parent f4b9adcf
Loading
Loading
Loading
Loading
+5 −6
Original line number Diff line number Diff line
@@ -505,13 +505,12 @@ regular releases. Our consultant is part of the core team.
## Monitoring and metrics

As other TPA servers, the CRM servers are monitored by
[Nagios](howto/nagios). The Redis server (and the related IPsec tunnel) is
particularly monitored by Nagios, using a special `PING` check, to
make sure both ends can talk to each other.
[Prometheus](service/prometheus). The Redis server (and the related IPsec tunnel) is
particularly monitored, using a `blackbox` check, to make sure both
ends can talk to each other.

There's also [Prometheus](service/prometheus) monitoring with graphs rendered by
[Grafana](howto/grafana). This includes an elaborate [Postfix dashboard](https://grafana.torproject.org/d/Ds5BxBYGk/postfix-mtail?orgId=1&from=now-24h&to=now&var-node=eugeni.torproject.org&var-node=crm-int-01.torproject.org)
watching to two mail servers.
There's also graphs rendered by [Grafana](howto/grafana). This includes an
elaborate [Postfix dashboard](https://grafana.torproject.org/d/Ds5BxBYGk/postfix-mtail?orgId=1&from=now-24h&to=now&var-node=eugeni.torproject.org&var-node=crm-int-01.torproject.org) watching to two mail servers.

We started working on [monitoring the CiviCRM health better](https://gitlab.torproject.org/tpo/web/civicrm/-/issues/78). So
far we collect metrics that look like this:
+0 −5
Original line number Diff line number Diff line
@@ -826,9 +826,6 @@ that will pop alerts on IRC if problems come up with the service. All
of them have playbooks that link to the [pager playbook](#pager-playbook) section
here.

We currently don't correctly cover for failed transactions, see
[tpo/web/donate-neo#116](https://gitlab.torproject.org/tpo/web/donate-neo/-/issues/116). 

The [donate neo donations](https://grafana.torproject.org/d/f36842c2-af41-48c2-ab71-442307ba2f75/donate-neo-donations) dashboard is the main view of the
service in Grafana. It shows the state of the CiviCRM kill switch,
transaction rates, errors, the rate limiter, and exception counts. It
@@ -838,8 +835,6 @@ draw correlations if there are issues with the service.
There are also links, on the top-right, to Django-specific dashboards
that can be used to diagnose performance issues.

See [tpo/web/donate-neo#75](https://gitlab.torproject.org/tpo/web/donate-neo/-/issues/75) for followup on missing metrics.

Also note that the CiviCRM side of things has its own metrics, see the
[CiviCRM monitoring and metrics documentation](service/crm#monitoring-and-metrics).