Changes

anarcat · 0e6fcdac
--- a/service/ci.md
+++ b/service/ci.md
@@ -162,17 +162,34 @@ thing about Jenkins.

 ## Monitoring and testing

-TODO: @ahf how do we monitor the runners? maybe the prometheus
-exporter has something? should we hook it inside nagios to get alerts
-when runners get overwhelmed? 
+To test a runner, it can be registered only with a project, to run
+non-critical jobs against it. See the [installation section](#Installation) for
+details on the setup.

-## Logs and metrics
+Monitoring is otherwise done through Prometheus, on a need-to basis,
+see the [log and metrics](#log-and-metrics) section below.

-TODO: do runners keep logs? where? does it matter? any PII?
+## Logs and metrics

-TODO: how about performance metrics? how do we know when we'll run out
-of capacity in the runner network since we don't host the f-droid
-stuff?
+GitLab runners send logs to syslog and systemd. They contain minimal
+private information: the most I could find were Git repository and
+Docker image URLs, which do contain usernames. Those end up in
+`/var/log/daemon.log`, which gets rotated daily, with a one-week
+retention.
+
+The GitLab instance exports a set of metrics to monitor CI. For
+example, `ci_pending_builds` shows the size of the queue,
+`ci_running_builds` shows the number of currently running builds,
+etc. Those are visible in the [GitLab grafana dashboard](https://grafana.torproject.org/d/QrDJktiMz/gitlab-omnibus),
+particularly in [this view](https://grafana.torproject.org/d/QrDJktiMz/gitlab-omnibus?orgId=1&refresh=1m&var-node=gitlab-02.torproject.org).
+
+Other metrics might become available in the future: for example,
+runners can export their own Prometheus metrics, but currently do
+not. They are, naturally, monitored through the `node-exporter` like
+all other TPO servers, however.
+
+TODO: monitor GitLab runners; they can be configured to expose metrics
+through a Prometheus exporter, we could hook this in our setup.

 ## Backups