update donate docs: we have metrics now (a3319780) · Commits · The Tor Project / TPA / Wiki Replica

service/crm.md

+5 −6

Original line number	Diff line number	Diff line
		@@ -505,13 +505,12 @@ regular releases. Our consultant is part of the core team.
		## Monitoring and metrics

		As other TPA servers, the CRM servers are monitored by
		[Nagios](howto/nagios). The Redis server (and the related IPsec tunnel) is
		particularly monitored by Nagios, using a special `PING` check, to
		make sure both ends can talk to each other.
		[Prometheus](service/prometheus). The Redis server (and the related IPsec tunnel) is
		particularly monitored, using a `blackbox` check, to make sure both
		ends can talk to each other.

		There's also [Prometheus](service/prometheus) monitoring with graphs rendered by
		[Grafana](howto/grafana). This includes an elaborate [Postfix dashboard](https://grafana.torproject.org/d/Ds5BxBYGk/postfix-mtail?orgId=1&from=now-24h&to=now&var-node=eugeni.torproject.org&var-node=crm-int-01.torproject.org)
		watching to two mail servers.
		There's also graphs rendered by [Grafana](howto/grafana). This includes an
		elaborate [Postfix dashboard](https://grafana.torproject.org/d/Ds5BxBYGk/postfix-mtail?orgId=1&from=now-24h&to=now&var-node=eugeni.torproject.org&var-node=crm-int-01.torproject.org) watching to two mail servers.

		We started working on [monitoring the CiviCRM health better](https://gitlab.torproject.org/tpo/web/civicrm/-/issues/78). So
		far we collect metrics that look like this:

+0 −5

Original line number	Diff line number	Diff line
		@@ -826,9 +826,6 @@ that will pop alerts on IRC if problems come up with the service. All
		of them have playbooks that link to the [pager playbook](#pager-playbook) section
		here.

		We currently don't correctly cover for failed transactions, see
		[tpo/web/donate-neo#116](https://gitlab.torproject.org/tpo/web/donate-neo/-/issues/116).

		The [donate neo donations](https://grafana.torproject.org/d/f36842c2-af41-48c2-ab71-442307ba2f75/donate-neo-donations) dashboard is the main view of the
		service in Grafana. It shows the state of the CiviCRM kill switch,
		transaction rates, errors, the rate limiter, and exception counts. It
		@@ -838,8 +835,6 @@ draw correlations if there are issues with the service.
		There are also links, on the top-right, to Django-specific dashboards
		that can be used to diagnose performance issues.

		See [tpo/web/donate-neo#75](https://gitlab.torproject.org/tpo/web/donate-neo/-/issues/75) for followup on missing metrics.

		Also note that the CiviCRM side of things has its own metrics, see the
		[CiviCRM monitoring and metrics documentation](service/crm#monitoring-and-metrics).