Changes

anarcat · 1872453b
--- a/service/donate.md
+++ b/service/donate.md
@@ -722,8 +722,28 @@ developing the Django app after @kez had gone.

 ## Monitoring and metrics

-<!-- describe how this service is monitored, how security issues and -->
-<!-- upgrades are tracked, see also "Upgrades" above. -->
+The donate site is monitored from [Prometheus](howto/prometheus), both
+at the system level (normal metrics like disk, CPU, memory, etc) and
+at the application level.
+
+There are a couple of alerts set in the alertmanager, all "warning",
+that will pop alerts on IRC if problems come up with the service. All
+of them have runbooks that link to the [pager playbook](#pager-playbook) section
+here.
+
+We currently don't correctly cover for failed transactions, see
+[tpo/web/donate-neo#116](https://gitlab.torproject.org/tpo/web/donate-neo/-/issues/116). 
+
+The [donate neo donations](https://grafana.torproject.org/d/f36842c2-af41-48c2-ab71-442307ba2f75/donate-neo-donations) dashboard is the main view of the
+service in Grafana. It shows the state of the CiviCRM kill switch,
+transaction rates, errors, the rate limiter, and exception counts. It
+also has an excerpt of system-level metrics from related servers to
+draw correlations if there are issues with the service.
+
+There are also links, on the top-right, to Django-specific dashboards
+that can be used to diagnose performance issues.
+
+See [tpo/web/donate-neo#75](https://gitlab.torproject.org/tpo/web/donate-neo/-/issues/75) for followup on missing metrics.

 ## Tests