... | ... | @@ -989,24 +989,6 @@ is being exceeded. |
|
|
By default the token lifetime is 5 minutes. This setting can be changed via the
|
|
|
GitLab admin web interface, in the Container registry configuration section.
|
|
|
|
|
|
## Testing service functionality
|
|
|
|
|
|
When we perform important maintenance on the service, like for example when
|
|
|
moving the VM from one cluster to another, we want to make sure that everything
|
|
|
is still working as expected. This section is a checklist of things to test in
|
|
|
order to gain confidence that everything is still working:
|
|
|
|
|
|
* [ ] logout/login
|
|
|
* [ ] check if all the systemd services are ok
|
|
|
* [ ] running gitlab-ctl status
|
|
|
* repository interactions
|
|
|
* [ ] cloning
|
|
|
* [ ] pushing a commit
|
|
|
* [ ] running a ci pipeline with build artifacts
|
|
|
* [ ] pulling an image from containers.tpo
|
|
|
* [ ] checking if the api is responsive (TODO add example test command)
|
|
|
* [ ] look at the web dashboard in the admin section (TODO add URL to that dashboard)
|
|
|
|
|
|
## Disaster recovery
|
|
|
|
|
|
In case the entire GitLab machine is destroyed, a new server should be
|
... | ... | @@ -1866,7 +1848,7 @@ See also [issues YOU have voted on](https://gitlab.com/gitlab-org/gitlab/-/issue |
|
|
* [copy reference shortcut disappeared](https://gitlab.com/gitlab-org/gitlab/-/issues/432498) (16.6, worked around by
|
|
|
providing a keybinding, <kbd>c r</kbd>)
|
|
|
|
|
|
## Monitoring and testing
|
|
|
## Monitoring and metrics
|
|
|
|
|
|
Monitoring right now is minimal: normal host-level metrics like disk
|
|
|
space, CPU usage, web port and TLS certificates are monitored by
|
... | ... | @@ -1874,10 +1856,10 @@ Nagios with our normal infrastructure, as a black box. |
|
|
|
|
|
Prometheus monitoring is built into the GitLab Omnibus package, so it
|
|
|
is *not* configured through our Puppet like other Prometheus
|
|
|
servers. It has still been (manually) integrated in our Prometheus
|
|
|
targets. It has still been (manually) integrated in our Prometheus
|
|
|
setup and Grafana dashboards (see [pager playbook](#pager-playbook)) have been deployed.
|
|
|
|
|
|
One problem with the current monitoring is that the GitLab exporters
|
|
|
Another problem with the current monitoring is that some GitLab exporters
|
|
|
are [currently hardcoded](https://gitlab.torproject.org/tpo/tpa/team/-/issues/40077).
|
|
|
|
|
|
We could also use the following tools to integrate alerting into
|
... | ... | @@ -1906,7 +1888,25 @@ could break the production server, but it was retired, see |
|
|
|
|
|
[tpo/tpa/team#41151]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/41151
|
|
|
|
|
|
## Logs and metrics
|
|
|
## Tests
|
|
|
|
|
|
When we perform important maintenance on the service, like for example when
|
|
|
moving the VM from one cluster to another, we want to make sure that everything
|
|
|
is still working as expected. This section is a checklist of things to test in
|
|
|
order to gain confidence that everything is still working:
|
|
|
|
|
|
* [ ] logout/login
|
|
|
* [ ] check if all the systemd services are ok
|
|
|
* [ ] running gitlab-ctl status
|
|
|
* repository interactions
|
|
|
* [ ] cloning
|
|
|
* [ ] pushing a commit
|
|
|
* [ ] running a ci pipeline with build artifacts
|
|
|
* [ ] pulling an image from containers.tpo
|
|
|
* [ ] checking if the api is responsive (TODO add example test command)
|
|
|
* [ ] look at the web dashboard in the admin section (TODO add URL to that dashboard)
|
|
|
|
|
|
## Logs
|
|
|
|
|
|
GitLab keeps an extensive (excessive?) amount of logs, in
|
|
|
`/var/log/gitlab`, which includes PII, including IP addresses.
|
... | ... | |