Changes

anarcat · 57bd2630
--- a/howto/gitlab.md
+++ b/howto/gitlab.md
@@ -989,24 +989,6 @@ is being exceeded.
 By default the token lifetime is 5 minutes. This setting can be changed via the
 GitLab admin web interface, in the Container registry configuration section.

-## Testing service functionality
-
-When we perform important maintenance on the service, like for example when
-moving the VM from one cluster to another, we want to make sure that everything
-is still working as expected. This section is a checklist of things to test in
-order to gain confidence that everything is still working:
-
-* [ ] logout/login
-* [ ] check if all the systemd services are ok
-* [ ] running gitlab-ctl status
-* repository interactions
-  * [ ] cloning
-  * [ ] pushing a commit
-  * [ ] running a ci pipeline with build artifacts
-* [ ] pulling an image from containers.tpo
-* [ ] checking if the api is responsive (TODO add example test command)
-* [ ] look at the web dashboard in the admin section (TODO add URL to that dashboard)
-
 ## Disaster recovery

 In case the entire GitLab machine is destroyed, a new server should be
@@ -1866,7 +1848,7 @@ See also [issues YOU have voted on](https://gitlab.com/gitlab-org/gitlab/-/issue
   * [copy reference shortcut disappeared](https://gitlab.com/gitlab-org/gitlab/-/issues/432498) (16.6, worked around by
     providing a keybinding, <kbd>c r</kbd>)

-## Monitoring and testing
+## Monitoring and metrics

 Monitoring right now is minimal: normal host-level metrics like disk
 space, CPU usage, web port and TLS certificates are monitored by
@@ -1874,10 +1856,10 @@ Nagios with our normal infrastructure, as a black box.

 Prometheus monitoring is built into the GitLab Omnibus package, so it
 is *not* configured through our Puppet like other Prometheus
-servers. It has still been (manually) integrated in our Prometheus
+targets. It has still been (manually) integrated in our Prometheus
 setup and Grafana dashboards (see [pager playbook](#pager-playbook)) have been deployed.

-One problem with the current monitoring is that the GitLab exporters
+Another problem with the current monitoring is that some GitLab exporters
 are [currently hardcoded](https://gitlab.torproject.org/tpo/tpa/team/-/issues/40077).

 We could also use the following tools to integrate alerting into
@@ -1906,7 +1888,25 @@ could break the production server, but it was retired, see

 [tpo/tpa/team#41151]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/41151

-## Logs and metrics
+## Tests
+
+When we perform important maintenance on the service, like for example when
+moving the VM from one cluster to another, we want to make sure that everything
+is still working as expected. This section is a checklist of things to test in
+order to gain confidence that everything is still working:
+
+* [ ] logout/login
+* [ ] check if all the systemd services are ok
+* [ ] running gitlab-ctl status
+* repository interactions
+  * [ ] cloning
+  * [ ] pushing a commit
+  * [ ] running a ci pipeline with build artifacts
+* [ ] pulling an image from containers.tpo
+* [ ] checking if the api is responsive (TODO add example test command)
+* [ ] look at the web dashboard in the admin section (TODO add URL to that dashboard)
+
+## Logs

 GitLab keeps an extensive (excessive?) amount of logs, in
 `/var/log/gitlab`, which includes PII, including IP addresses.