... | @@ -701,15 +701,28 @@ runner](https://gitlab.com/gitlab-org/gitlab-runner/-/issues) and a [project pag |
... | @@ -701,15 +701,28 @@ runner](https://gitlab.com/gitlab-org/gitlab-runner/-/issues) and a [project pag |
|
by tracking queue size and with runner-specific metrics like
|
|
by tracking queue size and with runner-specific metrics like
|
|
concurrency limit hits
|
|
concurrency limit hits
|
|
|
|
|
|
## Monitoring and testing
|
|
## Monitoring and metrics
|
|
|
|
|
|
|
|
CI metrics are aggregated in the [GitLab CI Overview Grafana
|
|
|
|
dashboard](https://grafana.torproject.org/d/fd0b2fb2-88d0-4f85-bc86-16164c083b51/gitlab-ci-overview?orgId=1). It features multiple exporter sources:
|
|
|
|
|
|
|
|
1. the GitLab rails exporter which gives us the queue size
|
|
|
|
2. the GitLab runner exporters, which show many jobs are running in
|
|
|
|
parallel (see [the upstream documentation](https://docs.gitlab.com/runner/monitoring/README.html))
|
|
|
|
3. a home made exporter that queries the GitLab database to extract
|
|
|
|
queue wait times
|
|
|
|
4. and finally the node exporter to show memory usage, load and disk
|
|
|
|
usage
|
|
|
|
|
|
|
|
Note that not all runners registered on GitLab are directly managed by
|
|
|
|
TPA, so they might not show up in our dashboards.
|
|
|
|
|
|
|
|
## Tests
|
|
|
|
|
|
To test a runner, it can be registered only with a project, to run
|
|
To test a runner, it can be registered only with a project, to run
|
|
non-critical jobs against it. See the [installation section](#Installation) for
|
|
non-critical jobs against it. See the [installation section](#Installation) for
|
|
details on the setup.
|
|
details on the setup.
|
|
|
|
|
|
Monitoring is otherwise done through Prometheus, on a need-to basis,
|
|
|
|
see the [log and metrics](#log-and-metrics) section below.
|
|
|
|
|
|
|
|
## Logs and metrics
|
|
## Logs and metrics
|
|
|
|
|
|
GitLab runners send logs to `syslog` and `systemd`. They contain minimal
|
|
GitLab runners send logs to `syslog` and `systemd`. They contain minimal
|
... | @@ -718,22 +731,6 @@ Docker image URLs, which do contain usernames. Those end up in |
... | @@ -718,22 +731,6 @@ Docker image URLs, which do contain usernames. Those end up in |
|
`/var/log/daemon.log`, which gets rotated daily, with a one-week
|
|
`/var/log/daemon.log`, which gets rotated daily, with a one-week
|
|
retention.
|
|
retention.
|
|
|
|
|
|
The GitLab instance exports a set of metrics to monitor CI. For
|
|
|
|
example, `ci_pending_builds` shows the size of the queue,
|
|
|
|
`ci_running_builds` shows the number of currently running builds,
|
|
|
|
etc. Those are visible in the [GitLab grafana dashboard](https://grafana.torproject.org/d/QrDJktiMz/gitlab-omnibus),
|
|
|
|
particularly in [this view](https://grafana.torproject.org/d/QrDJktiMz/gitlab-omnibus?orgId=1&refresh=1m&var-node=gitlab-02.torproject.org).
|
|
|
|
|
|
|
|
Runners are, naturally, monitored through the `node-exporter` like
|
|
|
|
all other TPO servers.
|
|
|
|
|
|
|
|
Runners also expose metrics through a built-in Prometheus exporter on
|
|
|
|
a predefined port, accessible only by the Prometheus server. See also
|
|
|
|
[the upstream documentation](https://docs.gitlab.com/runner/monitoring/README.html) about self-monitoring.
|
|
|
|
|
|
|
|
CI metrics are aggregated in the [GitLab CI Overview Grafana
|
|
|
|
dashboard](https://grafana.torproject.org/d/fd0b2fb2-88d0-4f85-bc86-16164c083b51/gitlab-ci-overview?orgId=1).
|
|
|
|
|
|
|
|
## Backups
|
|
## Backups
|
|
|
|
|
|
This service requires no backups: all configuration should be
|
|
This service requires no backups: all configuration should be
|
... | | ... | |