gitlab: move tests section in the right place, adapting to the service template authored by anarcat's avatar anarcat
......@@ -989,24 +989,6 @@ is being exceeded.
By default the token lifetime is 5 minutes. This setting can be changed via the
GitLab admin web interface, in the Container registry configuration section.
## Testing service functionality
When we perform important maintenance on the service, like for example when
moving the VM from one cluster to another, we want to make sure that everything
is still working as expected. This section is a checklist of things to test in
order to gain confidence that everything is still working:
* [ ] logout/login
* [ ] check if all the systemd services are ok
* [ ] running gitlab-ctl status
* repository interactions
* [ ] cloning
* [ ] pushing a commit
* [ ] running a ci pipeline with build artifacts
* [ ] pulling an image from containers.tpo
* [ ] checking if the api is responsive (TODO add example test command)
* [ ] look at the web dashboard in the admin section (TODO add URL to that dashboard)
## Disaster recovery
In case the entire GitLab machine is destroyed, a new server should be
......@@ -1866,7 +1848,7 @@ See also [issues YOU have voted on](https://gitlab.com/gitlab-org/gitlab/-/issue
* [copy reference shortcut disappeared](https://gitlab.com/gitlab-org/gitlab/-/issues/432498) (16.6, worked around by
providing a keybinding, <kbd>c r</kbd>)
## Monitoring and testing
## Monitoring and metrics
Monitoring right now is minimal: normal host-level metrics like disk
space, CPU usage, web port and TLS certificates are monitored by
......@@ -1874,10 +1856,10 @@ Nagios with our normal infrastructure, as a black box.
Prometheus monitoring is built into the GitLab Omnibus package, so it
is *not* configured through our Puppet like other Prometheus
servers. It has still been (manually) integrated in our Prometheus
targets. It has still been (manually) integrated in our Prometheus
setup and Grafana dashboards (see [pager playbook](#pager-playbook)) have been deployed.
One problem with the current monitoring is that the GitLab exporters
Another problem with the current monitoring is that some GitLab exporters
are [currently hardcoded](https://gitlab.torproject.org/tpo/tpa/team/-/issues/40077).
We could also use the following tools to integrate alerting into
......@@ -1906,7 +1888,25 @@ could break the production server, but it was retired, see
[tpo/tpa/team#41151]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/41151
## Logs and metrics
## Tests
When we perform important maintenance on the service, like for example when
moving the VM from one cluster to another, we want to make sure that everything
is still working as expected. This section is a checklist of things to test in
order to gain confidence that everything is still working:
* [ ] logout/login
* [ ] check if all the systemd services are ok
* [ ] running gitlab-ctl status
* repository interactions
* [ ] cloning
* [ ] pushing a commit
* [ ] running a ci pipeline with build artifacts
* [ ] pulling an image from containers.tpo
* [ ] checking if the api is responsive (TODO add example test command)
* [ ] look at the web dashboard in the admin section (TODO add URL to that dashboard)
## Logs
GitLab keeps an extensive (excessive?) amount of logs, in
`/var/log/gitlab`, which includes PII, including IP addresses.
......
......