... | ... | @@ -694,6 +694,10 @@ runner](https://gitlab.com/gitlab-org/gitlab-runner/-/issues) and a [project pag |
|
|
|
|
|
* [kept artifacts cannot be unkept](https://gitlab.com/gitlab-org/gitlab/-/issues/289954)
|
|
|
|
|
|
* GitLab doesn't track [wait times for jobs](https://gitlab.com/groups/gitlab-org/-/epics/10630), we approximate this
|
|
|
by tracking queue size and with runner-specific metrics like
|
|
|
concurrency limit hits
|
|
|
|
|
|
## Monitoring and testing
|
|
|
|
|
|
To test a runner, it can be registered only with a project, to run
|
... | ... | @@ -717,18 +721,18 @@ example, `ci_pending_builds` shows the size of the queue, |
|
|
etc. Those are visible in the [GitLab grafana dashboard](https://grafana.torproject.org/d/QrDJktiMz/gitlab-omnibus),
|
|
|
particularly in [this view](https://grafana.torproject.org/d/QrDJktiMz/gitlab-omnibus?orgId=1&refresh=1m&var-node=gitlab-02.torproject.org).
|
|
|
|
|
|
Other metrics might become available in the future: for example,
|
|
|
runners can export their own Prometheus metrics, but currently do
|
|
|
not. They are, naturally, monitored through the `node-exporter` like
|
|
|
all other TPO servers, however.
|
|
|
|
|
|
We may eventually monitor GitLab runners directly; they can be
|
|
|
configured to expose metrics through a Prometheus exporter. The Puppet
|
|
|
module supports this through the `gitlab_ci_runner::metrics_server`
|
|
|
variable, but we would need to hook it into our server as well. See
|
|
|
also [the upstream documentation](https://docs.gitlab.com/runner/monitoring/README.html). Right now it feels the existing
|
|
|
"node"-level and the GitLab-level monitoring in Prometheus is
|
|
|
sufficient.
|
|
|
Runners are, naturally, monitored through the `node-exporter` like
|
|
|
all other TPO servers.
|
|
|
|
|
|
Runners also expose metrics through a built-in Prometheus exporter on
|
|
|
a predefined port, accessible only by the Prometheus server. The
|
|
|
Puppet module supports this through the
|
|
|
`gitlab_ci_runner::metrics_server` variable, but we have rolled our
|
|
|
own thing for now, see [issue 41042](https://gitlab.torproject.org/tpo/tpa/team/-/issues/41042). See also [the upstream
|
|
|
documentation](https://docs.gitlab.com/runner/monitoring/README.html) about self-monitoring.
|
|
|
|
|
|
CI metrics are aggregated in the [GitLab CI Overview Grafana
|
|
|
dashboard](https://grafana.torproject.org/d/fd0b2fb2-88d0-4f85-bc86-16164c083b51/gitlab-ci-overview?orgId=1).
|
|
|
|
|
|
## Backups
|
|
|
|
... | ... | |