Changes
Page history
document what monitoring we have in GitLab now
authored
Jan 21, 2021
by
anarcat
Show whitespace changes
Inline
Side-by-side
service/ci.md
View page @
0e6fcdac
...
...
@@ -162,17 +162,34 @@ thing about Jenkins.
## Monitoring and testing
T
ODO: @ahf how do we monitor the runners? maybe the prometheus
exporter has something? should we hook it inside nagios to get alerts
when runners get overwhelmed?
T
o test a runner, it can be registered only with a project, to run
non-critical jobs against it. See the
[
installation section
](
#Installation
)
for
details on the setup.
## Logs and metrics
Monitoring is otherwise done through Prometheus, on a need-to basis,
see the
[
log and metrics
](
#log-and-metrics
)
section below.
TODO: do runners keep logs? where? does it matter? any PII?
## Logs and metrics
TODO: how about performance metrics? how do we know when we'll run out
of capacity in the runner network since we don't host the f-droid
stuff?
GitLab runners send logs to syslog and systemd. They contain minimal
private information: the most I could find were Git repository and
Docker image URLs, which do contain usernames. Those end up in
`/var/log/daemon.log`
, which gets rotated daily, with a one-week
retention.
The GitLab instance exports a set of metrics to monitor CI. For
example,
`ci_pending_builds`
shows the size of the queue,
`ci_running_builds`
shows the number of currently running builds,
etc. Those are visible in the
[
GitLab grafana dashboard
](
https://grafana.torproject.org/d/QrDJktiMz/gitlab-omnibus
)
,
particularly in
[
this view
](
https://grafana.torproject.org/d/QrDJktiMz/gitlab-omnibus?orgId=1&refresh=1m&var-node=gitlab-02.torproject.org
)
.
Other metrics might become available in the future: for example,
runners can export their own Prometheus metrics, but currently do
not. They are, naturally, monitored through the
`node-exporter`
like
all other TPO servers, however.
TODO: monitor GitLab runners; they can be configured to expose metrics
through a Prometheus exporter, we could hook this in our setup.
## Backups
...
...
...
...