document what monitoring we have in GitLab now authored by anarcat's avatar anarcat
......@@ -162,17 +162,34 @@ thing about Jenkins.
## Monitoring and testing
TODO: @ahf how do we monitor the runners? maybe the prometheus
exporter has something? should we hook it inside nagios to get alerts
when runners get overwhelmed?
To test a runner, it can be registered only with a project, to run
non-critical jobs against it. See the [installation section](#Installation) for
details on the setup.
## Logs and metrics
Monitoring is otherwise done through Prometheus, on a need-to basis,
see the [log and metrics](#log-and-metrics) section below.
TODO: do runners keep logs? where? does it matter? any PII?
## Logs and metrics
TODO: how about performance metrics? how do we know when we'll run out
of capacity in the runner network since we don't host the f-droid
stuff?
GitLab runners send logs to syslog and systemd. They contain minimal
private information: the most I could find were Git repository and
Docker image URLs, which do contain usernames. Those end up in
`/var/log/daemon.log`, which gets rotated daily, with a one-week
retention.
The GitLab instance exports a set of metrics to monitor CI. For
example, `ci_pending_builds` shows the size of the queue,
`ci_running_builds` shows the number of currently running builds,
etc. Those are visible in the [GitLab grafana dashboard](https://grafana.torproject.org/d/QrDJktiMz/gitlab-omnibus),
particularly in [this view](https://grafana.torproject.org/d/QrDJktiMz/gitlab-omnibus?orgId=1&refresh=1m&var-node=gitlab-02.torproject.org).
Other metrics might become available in the future: for example,
runners can export their own Prometheus metrics, but currently do
not. They are, naturally, monitored through the `node-exporter` like
all other TPO servers, however.
TODO: monitor GitLab runners; they can be configured to expose metrics
through a Prometheus exporter, we could hook this in our setup.
## Backups
......
......