GitLab CI runner monitoring documentation (#41042) authored by anarcat's avatar anarcat
...@@ -694,6 +694,10 @@ runner](https://gitlab.com/gitlab-org/gitlab-runner/-/issues) and a [project pag ...@@ -694,6 +694,10 @@ runner](https://gitlab.com/gitlab-org/gitlab-runner/-/issues) and a [project pag
* [kept artifacts cannot be unkept](https://gitlab.com/gitlab-org/gitlab/-/issues/289954) * [kept artifacts cannot be unkept](https://gitlab.com/gitlab-org/gitlab/-/issues/289954)
* GitLab doesn't track [wait times for jobs](https://gitlab.com/groups/gitlab-org/-/epics/10630), we approximate this
by tracking queue size and with runner-specific metrics like
concurrency limit hits
## Monitoring and testing ## Monitoring and testing
To test a runner, it can be registered only with a project, to run To test a runner, it can be registered only with a project, to run
...@@ -717,18 +721,18 @@ example, `ci_pending_builds` shows the size of the queue, ...@@ -717,18 +721,18 @@ example, `ci_pending_builds` shows the size of the queue,
etc. Those are visible in the [GitLab grafana dashboard](https://grafana.torproject.org/d/QrDJktiMz/gitlab-omnibus), etc. Those are visible in the [GitLab grafana dashboard](https://grafana.torproject.org/d/QrDJktiMz/gitlab-omnibus),
particularly in [this view](https://grafana.torproject.org/d/QrDJktiMz/gitlab-omnibus?orgId=1&refresh=1m&var-node=gitlab-02.torproject.org). particularly in [this view](https://grafana.torproject.org/d/QrDJktiMz/gitlab-omnibus?orgId=1&refresh=1m&var-node=gitlab-02.torproject.org).
Other metrics might become available in the future: for example, Runners are, naturally, monitored through the `node-exporter` like
runners can export their own Prometheus metrics, but currently do all other TPO servers.
not. They are, naturally, monitored through the `node-exporter` like
all other TPO servers, however. Runners also expose metrics through a built-in Prometheus exporter on
a predefined port, accessible only by the Prometheus server. The
We may eventually monitor GitLab runners directly; they can be Puppet module supports this through the
configured to expose metrics through a Prometheus exporter. The Puppet `gitlab_ci_runner::metrics_server` variable, but we have rolled our
module supports this through the `gitlab_ci_runner::metrics_server` own thing for now, see [issue 41042](https://gitlab.torproject.org/tpo/tpa/team/-/issues/41042). See also [the upstream
variable, but we would need to hook it into our server as well. See documentation](https://docs.gitlab.com/runner/monitoring/README.html) about self-monitoring.
also [the upstream documentation](https://docs.gitlab.com/runner/monitoring/README.html). Right now it feels the existing
"node"-level and the GitLab-level monitoring in Prometheus is CI metrics are aggregated in the [GitLab CI Overview Grafana
sufficient. dashboard](https://grafana.torproject.org/d/fd0b2fb2-88d0-4f85-bc86-16164c083b51/gitlab-ci-overview?orgId=1).
## Backups ## Backups
... ...
......